Overview

Brought to you by YData

Dataset statistics

Number of variables42
Number of observations98879
Missing cells0
Missing cells (%)0.0%
Duplicate rows32
Duplicate rows (%)< 0.1%
Total size in memory32.4 MiB
Average record size in memory344.0 B

Variable types

Numeric8
Categorical33
Text1

Alerts

Dataset has 32 (< 0.1%) duplicate rowsDuplicates
age is highly overall correlated with detailed_household_and_family_stat and 2 other fieldsHigh correlation
citizenship is highly overall correlated with country_of_birth_father and 2 other fieldsHigh correlation
class_of_worker is highly overall correlated with detailed_industry_recode and 4 other fieldsHigh correlation
country_of_birth_father is highly overall correlated with citizenship and 3 other fieldsHigh correlation
country_of_birth_mother is highly overall correlated with citizenship and 3 other fieldsHigh correlation
country_of_birth_self is highly overall correlated with citizenship and 2 other fieldsHigh correlation
detailed_household_and_family_stat is highly overall correlated with age and 4 other fieldsHigh correlation
detailed_household_summary_in_household is highly overall correlated with detailed_household_and_family_stat and 3 other fieldsHigh correlation
detailed_industry_recode is highly overall correlated with class_of_worker and 2 other fieldsHigh correlation
detailed_occupation_recode is highly overall correlated with class_of_worker and 2 other fieldsHigh correlation
education is highly overall correlated with tax_filer_stat and 1 other fieldsHigh correlation
family_members_under_18 is highly overall correlated with detailed_household_and_family_stat and 3 other fieldsHigh correlation
fill_inc_questionnaire_for_veteran's_admin is highly overall correlated with veterans_benefitsHigh correlation
full_or_part_time_employment_stat is highly overall correlated with live_in_this_house_1_year_ago and 2 other fieldsHigh correlation
hispanic_origin is highly overall correlated with country_of_birth_father and 1 other fieldsHigh correlation
live_in_this_house_1_year_ago is highly overall correlated with full_or_part_time_employment_stat and 6 other fieldsHigh correlation
major_industry_code is highly overall correlated with class_of_worker and 3 other fieldsHigh correlation
major_occupation_code is highly overall correlated with class_of_worker and 3 other fieldsHigh correlation
marital_stat is highly overall correlated with tax_filer_statHigh correlation
migration_code_change_in_msa is highly overall correlated with live_in_this_house_1_year_ago and 5 other fieldsHigh correlation
migration_code_change_in_reg is highly overall correlated with full_or_part_time_employment_stat and 4 other fieldsHigh correlation
migration_code_move_within_reg is highly overall correlated with live_in_this_house_1_year_ago and 5 other fieldsHigh correlation
migration_prev_res_in_sunbelt is highly overall correlated with live_in_this_house_1_year_ago and 3 other fieldsHigh correlation
num_persons_worked_for_employer is highly overall correlated with class_of_worker and 2 other fieldsHigh correlation
region_of_previous_residence is highly overall correlated with live_in_this_house_1_year_ago and 3 other fieldsHigh correlation
tax_filer_stat is highly overall correlated with age and 8 other fieldsHigh correlation
veterans_benefits is highly overall correlated with age and 6 other fieldsHigh correlation
weeks_worked_in_year is highly overall correlated with num_persons_worked_for_employer and 1 other fieldsHigh correlation
year is highly overall correlated with full_or_part_time_employment_stat and 4 other fieldsHigh correlation
enroll_in_edu_inst_last_wk is highly imbalanced (74.3%) Imbalance
race is highly imbalanced (61.8%) Imbalance
hispanic_origin is highly imbalanced (71.6%) Imbalance
member_of_a_labor_union is highly imbalanced (67.5%) Imbalance
reason_for_unemployment is highly imbalanced (89.1%) Imbalance
region_of_previous_residence is highly imbalanced (78.5%) Imbalance
migration_code_move_within_reg is highly imbalanced (54.9%) Imbalance
migration_prev_res_in_sunbelt is highly imbalanced (70.5%) Imbalance
family_members_under_18 is highly imbalanced (50.3%) Imbalance
country_of_birth_father is highly imbalanced (70.6%) Imbalance
country_of_birth_mother is highly imbalanced (71.2%) Imbalance
country_of_birth_self is highly imbalanced (81.5%) Imbalance
citizenship is highly imbalanced (65.2%) Imbalance
own_business_or_self_employed is highly imbalanced (67.5%) Imbalance
fill_inc_questionnaire_for_veteran's_admin is highly imbalanced (94.3%) Imbalance
target is highly imbalanced (66.2%) Imbalance
dividends_from_stocks is highly skewed (γ1 = 25.25847346) Skewed
age has 1358 (1.4%) zeros Zeros
wage_per_hour has 93295 (94.4%) zeros Zeros
capital_gains has 95157 (96.2%) zeros Zeros
capital_losses has 96971 (98.1%) zeros Zeros
dividends_from_stocks has 88351 (89.4%) zeros Zeros
num_persons_worked_for_employer has 47009 (47.5%) zeros Zeros
weeks_worked_in_year has 47009 (47.5%) zeros Zeros

Reproduction

Analysis started2025-01-20 00:38:13.969372
Analysis finished2025-01-20 00:38:47.476336
Duration33.51 seconds
Software versionydata-profiling vv4.12.1
Download configurationconfig.json

Variables

age
Real number (ℝ)

High correlation  Zeros 

Distinct91
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.868668
Minimum0
Maximum90
Zeros1358
Zeros (%)1.4%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:47.599308image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q116
median33
Q350
95-th percentile75
Maximum90
Range90
Interquartile range (IQR)34

Descriptive statistics

Standard deviation22.275233
Coefficient of variation (CV)0.63883234
Kurtosis-0.73163274
Mean34.868668
Median Absolute Deviation (MAD)17
Skewness0.36351952
Sum3447779
Variance496.18599
MonotonicityNot monotonic
2025-01-19T18:38:47.786307image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33 1750
 
1.8%
34 1720
 
1.7%
35 1670
 
1.7%
37 1646
 
1.7%
38 1638
 
1.7%
32 1628
 
1.6%
3 1628
 
1.6%
30 1614
 
1.6%
4 1612
 
1.6%
31 1605
 
1.6%
Other values (81) 82368
83.3%
ValueCountFrequency (%)
0 1358
1.4%
1 1448
1.5%
2 1494
1.5%
3 1628
1.6%
4 1612
1.6%
5 1576
1.6%
6 1462
1.5%
7 1543
1.6%
8 1514
1.5%
9 1475
1.5%
ValueCountFrequency (%)
90 373
0.4%
89 109
 
0.1%
88 141
 
0.1%
87 155
0.2%
86 162
0.2%
85 216
0.2%
84 267
0.3%
83 276
0.3%
82 324
0.3%
81 336
0.3%

class_of_worker
Categorical

High correlation 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
49199 
Private sector
36068 
Government
7405 
Self-employed
5928 
Not employed
 
279

Length

Max length15
Median length14
Mean length14.132414
Min length10

Characters and Unicode

Total characters1397399
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPrivate sector
2nd rowSelf-employed
3rd rowNot in universe
4th rowPrivate sector
5th rowPrivate sector

Common Values

ValueCountFrequency (%)
Not in universe 49199
49.8%
Private sector 36068
36.5%
Government 7405
 
7.5%
Self-employed 5928
 
6.0%
Not employed 279
 
0.3%

Length

2025-01-19T18:38:47.938303image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:48.063334image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 49478
21.2%
in 49199
21.1%
universe 49199
21.1%
private 36068
15.4%
sector 36068
15.4%
government 7405
 
3.2%
self-employed 5928
 
2.5%
employed 279
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 203686
14.6%
134745
9.6%
i 134466
9.6%
t 129019
9.2%
r 128740
9.2%
n 113208
8.1%
o 99158
7.1%
v 92672
6.6%
s 85267
 
6.1%
N 49478
 
3.5%
Other values (13) 226960
16.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1157847
82.9%
Space Separator 134745
 
9.6%
Uppercase Letter 98879
 
7.1%
Dash Punctuation 5928
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 203686
17.6%
i 134466
11.6%
t 129019
11.1%
r 128740
11.1%
n 113208
9.8%
o 99158
8.6%
v 92672
8.0%
s 85267
7.4%
u 49199
 
4.2%
c 36068
 
3.1%
Other values (7) 86364
7.5%
Uppercase Letter
ValueCountFrequency (%)
N 49478
50.0%
P 36068
36.5%
G 7405
 
7.5%
S 5928
 
6.0%
Space Separator
ValueCountFrequency (%)
134745
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5928
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1256726
89.9%
Common 140673
 
10.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 203686
16.2%
i 134466
10.7%
t 129019
10.3%
r 128740
10.2%
n 113208
9.0%
o 99158
7.9%
v 92672
7.4%
s 85267
6.8%
N 49478
 
3.9%
u 49199
 
3.9%
Other values (11) 171833
13.7%
Common
ValueCountFrequency (%)
134745
95.8%
- 5928
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1397399
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 203686
14.6%
134745
9.6%
i 134466
9.6%
t 129019
9.2%
r 128740
9.2%
n 113208
8.1%
o 99158
7.1%
v 92672
6.6%
s 85267
 
6.1%
N 49478
 
3.5%
Other values (13) 226960
16.2%

detailed_industry_recode
Categorical

High correlation 

Distinct42
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe or children
49403 
Public administration
8933 
Manufacturing
 
4710
Business and repair services
 
3114
Manufacturing-durable goods
 
3066
Other values (37)
29653 

Length

Max length58
Median length27
Mean length24.870377
Min length5

Characters and Unicode

Total characters2459158
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTransportation
2nd rowWholesale and retail trade
3rd rowNot in universe or children
4th rowBusiness and repair services
5th rowManufacturing-durable goods

Common Values

ValueCountFrequency (%)
Not in universe or children 49403
50.0%
Public administration 8933
 
9.0%
Manufacturing 4710
 
4.8%
Business and repair services 3114
 
3.1%
Manufacturing-durable goods 3066
 
3.1%
Wholesale and retail trade 2434
 
2.5%
Public administration and armed forces 2304
 
2.3%
Trade 2204
 
2.2%
Professional services 2142
 
2.2%
Professional and related services 1964
 
2.0%
Other values (32) 18605
 
18.8%

Length

2025-01-19T18:38:48.312940image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 50224
13.6%
or 49403
13.4%
children 49403
13.4%
in 49403
13.4%
universe 49403
13.4%
services 13869
 
3.8%
and 13688
 
3.7%
public 11788
 
3.2%
administration 11237
 
3.0%
trade 5335
 
1.4%
Other values (45) 65783
17.8%

Most occurring characters

ValueCountFrequency (%)
270657
11.0%
i 259576
10.6%
n 242723
9.9%
e 238935
 
9.7%
r 235705
 
9.6%
o 148682
 
6.0%
s 136871
 
5.6%
t 114552
 
4.7%
a 105972
 
4.3%
u 103385
 
4.2%
Other values (29) 602100
24.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2073565
84.3%
Space Separator 270657
 
11.0%
Uppercase Letter 100308
 
4.1%
Other Punctuation 8274
 
0.3%
Dash Punctuation 6354
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 259576
12.5%
n 242723
11.7%
e 238935
11.5%
r 235705
11.4%
o 148682
 
7.2%
s 136871
 
6.6%
t 114552
 
5.5%
a 105972
 
5.1%
u 103385
 
5.0%
c 99350
 
4.8%
Other values (11) 387814
18.7%
Uppercase Letter
ValueCountFrequency (%)
N 50224
50.1%
P 17692
 
17.6%
M 12908
 
12.9%
B 4570
 
4.6%
T 4219
 
4.2%
W 3393
 
3.4%
H 2081
 
2.1%
F 1081
 
1.1%
E 1037
 
1.0%
S 804
 
0.8%
Other values (5) 2299
 
2.3%
Space Separator
ValueCountFrequency (%)
270657
100.0%
Other Punctuation
ValueCountFrequency (%)
, 8274
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6354
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2173873
88.4%
Common 285285
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 259576
11.9%
n 242723
11.2%
e 238935
11.0%
r 235705
10.8%
o 148682
 
6.8%
s 136871
 
6.3%
t 114552
 
5.3%
a 105972
 
4.9%
u 103385
 
4.8%
c 99350
 
4.6%
Other values (26) 488122
22.5%
Common
ValueCountFrequency (%)
270657
94.9%
, 8274
 
2.9%
- 6354
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2459158
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
270657
11.0%
i 259576
10.6%
n 242723
9.9%
e 238935
 
9.7%
r 235705
 
9.6%
o 148682
 
6.0%
s 136871
 
5.6%
t 114552
 
4.7%
a 105972
 
4.3%
u 103385
 
4.2%
Other values (29) 602100
24.5%

detailed_occupation_recode
Categorical

High correlation 

Distinct47
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
49403 
Other executive, admin and managerial
 
4356
Food service occupations
 
3814
Computer equipment operators
 
2771
Personal service occupations
 
2670
Other values (42)
35865 

Length

Max length46
Median length43
Mean length23.328037
Min length9

Characters and Unicode

Total characters2306653
Distinct characters40
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowConstruction trades
2nd rowOther professional specialty occupations
3rd rowNot in universe
4th rowManagement related occupations
5th rowAutomobile mechanics and repairers

Common Values

ValueCountFrequency (%)
Not in universe 49403
50.0%
Other executive, admin and managerial 4356
 
4.4%
Food service occupations 3814
 
3.9%
Computer equipment operators 2771
 
2.8%
Personal service occupations 2670
 
2.7%
Construction trades 2093
 
2.1%
Automobile mechanics and repairers 2022
 
2.0%
Teachers, except college and university 1806
 
1.8%
Supervisors and proprietors, sales occupations 1742
 
1.8%
Forestry and fishing occupations 1708
 
1.7%
Other values (37) 26494
26.8%

Length

2025-01-19T18:38:48.518972image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 49403
14.9%
universe 49403
14.9%
in 49403
14.9%
occupations 24088
 
7.3%
and 23313
 
7.0%
other 10998
 
3.3%
service 9006
 
2.7%
operators 5122
 
1.5%
related 5025
 
1.5%
admin 4632
 
1.4%
Other values (83) 100365
30.3%

Most occurring characters

ValueCountFrequency (%)
e 252503
10.9%
231879
10.1%
i 210919
 
9.1%
n 206377
 
8.9%
o 166383
 
7.2%
t 159647
 
6.9%
r 159308
 
6.9%
s 155618
 
6.7%
a 129532
 
5.6%
u 101707
 
4.4%
Other values (30) 532780
23.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1965593
85.2%
Space Separator 231879
 
10.1%
Uppercase Letter 98879
 
4.3%
Other Punctuation 10302
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 252503
12.8%
i 210919
10.7%
n 206377
10.5%
o 166383
8.5%
t 159647
8.1%
r 159308
8.1%
s 155618
7.9%
a 129532
 
6.6%
u 101707
 
5.2%
c 97524
 
5.0%
Other values (15) 326075
16.6%
Uppercase Letter
ValueCountFrequency (%)
N 49603
50.2%
O 11274
 
11.4%
F 8529
 
8.6%
C 6430
 
6.5%
P 6343
 
6.4%
M 3725
 
3.8%
S 2676
 
2.7%
H 2422
 
2.4%
E 2192
 
2.2%
T 2177
 
2.2%
Other values (3) 3508
 
3.5%
Space Separator
ValueCountFrequency (%)
231879
100.0%
Other Punctuation
ValueCountFrequency (%)
, 10302
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2064472
89.5%
Common 242181
 
10.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 252503
12.2%
i 210919
10.2%
n 206377
10.0%
o 166383
 
8.1%
t 159647
 
7.7%
r 159308
 
7.7%
s 155618
 
7.5%
a 129532
 
6.3%
u 101707
 
4.9%
c 97524
 
4.7%
Other values (28) 424954
20.6%
Common
ValueCountFrequency (%)
231879
95.7%
, 10302
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2306653
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 252503
10.9%
231879
10.1%
i 210919
 
9.1%
n 206377
 
8.9%
o 166383
 
7.2%
t 159647
 
6.9%
r 159308
 
6.9%
s 155618
 
6.7%
a 129532
 
5.6%
u 101707
 
4.4%
Other values (30) 532780
23.1%

education
Categorical

High correlation 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
High School Graduate
24141 
Children
22600 
Below High School
18733 
Some College
18719 
College Graduate
9884 

Length

Max length20
Median length16
Mean length14.531731
Min length8

Characters and Unicode

Total characters1436883
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBelow High School
2nd rowSome College
3rd rowChildren
4th rowHigh School Graduate
5th rowHigh School Graduate

Common Values

ValueCountFrequency (%)
High School Graduate 24141
24.4%
Children 22600
22.9%
Below High School 18733
18.9%
Some College 18719
18.9%
College Graduate 9884
10.0%
Advanced Degree 4802
 
4.9%

Length

2025-01-19T18:38:48.690956image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:48.835952image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
high 42874
19.7%
school 42874
19.7%
graduate 34025
15.6%
college 28603
13.1%
children 22600
10.4%
below 18733
8.6%
some 18719
8.6%
advanced 4802
 
2.2%
degree 4802
 
2.2%

Most occurring characters

ValueCountFrequency (%)
e 170491
11.9%
o 151803
 
10.6%
l 141413
 
9.8%
119153
 
8.3%
h 108348
 
7.5%
g 76279
 
5.3%
a 72852
 
5.1%
d 66229
 
4.6%
i 65474
 
4.6%
S 61593
 
4.3%
Other values (14) 403248
28.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1099698
76.5%
Uppercase Letter 218032
 
15.2%
Space Separator 119153
 
8.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 170491
15.5%
o 151803
13.8%
l 141413
12.9%
h 108348
9.9%
g 76279
6.9%
a 72852
6.6%
d 66229
 
6.0%
i 65474
 
6.0%
r 61427
 
5.6%
c 47676
 
4.3%
Other values (6) 137706
12.5%
Uppercase Letter
ValueCountFrequency (%)
S 61593
28.2%
C 51203
23.5%
H 42874
19.7%
G 34025
15.6%
B 18733
 
8.6%
A 4802
 
2.2%
D 4802
 
2.2%
Space Separator
ValueCountFrequency (%)
119153
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1317730
91.7%
Common 119153
 
8.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 170491
12.9%
o 151803
11.5%
l 141413
10.7%
h 108348
 
8.2%
g 76279
 
5.8%
a 72852
 
5.5%
d 66229
 
5.0%
i 65474
 
5.0%
S 61593
 
4.7%
r 61427
 
4.7%
Other values (13) 341821
25.9%
Common
ValueCountFrequency (%)
119153
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1436883
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 170491
11.9%
o 151803
 
10.6%
l 141413
 
9.8%
119153
 
8.3%
h 108348
 
7.5%
g 76279
 
5.3%
a 72852
 
5.1%
d 66229
 
4.6%
i 65474
 
4.6%
S 61593
 
4.3%
Other values (14) 403248
28.1%

wage_per_hour
Real number (ℝ)

Zeros 

Distinct894
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.947613
Minimum0
Maximum9900
Zeros93295
Zeros (%)94.4%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:49.008909image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile500
Maximum9900
Range9900
Interquartile range (IQR)0

Descriptive statistics

Standard deviation271.35721
Coefficient of variation (CV)4.9384713
Kurtosis148.26554
Mean54.947613
Median Absolute Deviation (MAD)0
Skewness8.7180739
Sum5433165
Variance73634.733
MonotonicityNot monotonic
2025-01-19T18:38:49.174658image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 93295
94.4%
500 360
 
0.4%
600 285
 
0.3%
700 281
 
0.3%
800 249
 
0.3%
1000 213
 
0.2%
425 178
 
0.2%
900 154
 
0.2%
550 143
 
0.1%
1100 125
 
0.1%
Other values (884) 3596
 
3.6%
ValueCountFrequency (%)
0 93295
94.4%
100 2
 
< 0.1%
150 3
 
< 0.1%
178 1
 
< 0.1%
200 12
 
< 0.1%
205 1
 
< 0.1%
208 1
 
< 0.1%
209 1
 
< 0.1%
210 3
 
< 0.1%
211 2
 
< 0.1%
ValueCountFrequency (%)
9900 2
< 0.1%
8831 1
< 0.1%
8800 1
< 0.1%
8000 2
< 0.1%
7700 1
< 0.1%
7500 1
< 0.1%
7400 1
< 0.1%
7000 2
< 0.1%
6500 2
< 0.1%
6000 1
< 0.1%

enroll_in_edu_inst_last_wk
Categorical

Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
92550 
High school
 
3499
College or university
 
2830

Length

Max length22
Median length16
Mean length16.030178
Min length12

Characters and Unicode

Total characters1585048
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Not in universe
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 92550
93.6%
High school 3499
 
3.5%
College or university 2830
 
2.9%

Length

2025-01-19T18:38:49.344689image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:49.488390image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 92550
31.6%
in 92550
31.6%
universe 92550
31.6%
high 3499
 
1.2%
school 3499
 
1.2%
college 2830
 
1.0%
or 2830
 
1.0%
university 2830
 
1.0%

Most occurring characters

ValueCountFrequency (%)
293138
18.5%
i 194259
12.3%
e 193590
12.2%
n 187930
11.9%
o 105208
 
6.6%
s 98879
 
6.2%
r 98210
 
6.2%
v 95380
 
6.0%
u 95380
 
6.0%
t 95380
 
6.0%
Other values (8) 127694
8.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1193031
75.3%
Space Separator 293138
 
18.5%
Uppercase Letter 98879
 
6.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 194259
16.3%
e 193590
16.2%
n 187930
15.8%
o 105208
8.8%
s 98879
8.3%
r 98210
8.2%
v 95380
8.0%
u 95380
8.0%
t 95380
8.0%
l 9159
 
0.8%
Other values (4) 19656
 
1.6%
Uppercase Letter
ValueCountFrequency (%)
N 92550
93.6%
H 3499
 
3.5%
C 2830
 
2.9%
Space Separator
ValueCountFrequency (%)
293138
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1291910
81.5%
Common 293138
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 194259
15.0%
e 193590
15.0%
n 187930
14.5%
o 105208
8.1%
s 98879
7.7%
r 98210
7.6%
v 95380
7.4%
u 95380
7.4%
t 95380
7.4%
N 92550
7.2%
Other values (7) 35144
 
2.7%
Common
ValueCountFrequency (%)
293138
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1585048
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
293138
18.5%
i 194259
12.3%
e 193590
12.2%
n 187930
11.9%
o 105208
 
6.6%
s 98879
 
6.2%
r 98210
 
6.2%
v 95380
 
6.0%
u 95380
 
6.0%
t 95380
 
6.0%
Other values (8) 127694
8.1%

marital_stat
Categorical

High correlation 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Married
42422 
Never Married
42272 
Divorced
6450 
Widowed
5324 
Separated
 
1696

Length

Max length21
Median length13
Mean length9.7658451
Min length7

Characters and Unicode

Total characters965637
Distinct characters22
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMarried
2nd rowMarried
3rd rowNever Married
4th rowDivorced
5th rowDivorced

Common Values

ValueCountFrequency (%)
Married 42422
42.9%
Never Married 42272
42.8%
Divorced 6450
 
6.5%
Widowed 5324
 
5.4%
Separated 1696
 
1.7%
Married-spouse absent 715
 
0.7%

Length

2025-01-19T18:38:49.626272image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:49.912283image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
married 84694
59.7%
never 42272
29.8%
divorced 6450
 
4.5%
widowed 5324
 
3.8%
separated 1696
 
1.2%
married-spouse 715
 
0.5%
absent 715
 
0.5%

Most occurring characters

ValueCountFrequency (%)
r 221236
22.9%
e 186549
19.3%
d 104203
10.8%
i 97183
10.1%
a 89516
9.3%
M 85409
 
8.8%
v 48722
 
5.0%
42987
 
4.5%
N 42272
 
4.4%
o 12489
 
1.3%
Other values (12) 35071
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 780784
80.9%
Uppercase Letter 141151
 
14.6%
Space Separator 42987
 
4.5%
Dash Punctuation 715
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 221236
28.3%
e 186549
23.9%
d 104203
13.3%
i 97183
12.4%
a 89516
11.5%
v 48722
 
6.2%
o 12489
 
1.6%
c 6450
 
0.8%
w 5324
 
0.7%
p 2411
 
0.3%
Other values (5) 6701
 
0.9%
Uppercase Letter
ValueCountFrequency (%)
M 85409
60.5%
N 42272
29.9%
D 6450
 
4.6%
W 5324
 
3.8%
S 1696
 
1.2%
Space Separator
ValueCountFrequency (%)
42987
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 715
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 921935
95.5%
Common 43702
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 221236
24.0%
e 186549
20.2%
d 104203
11.3%
i 97183
10.5%
a 89516
9.7%
M 85409
 
9.3%
v 48722
 
5.3%
N 42272
 
4.6%
o 12489
 
1.4%
c 6450
 
0.7%
Other values (10) 27906
 
3.0%
Common
ValueCountFrequency (%)
42987
98.4%
- 715
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 965637
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 221236
22.9%
e 186549
19.3%
d 104203
10.8%
i 97183
10.1%
a 89516
9.3%
M 85409
 
8.8%
v 48722
 
5.0%
42987
 
4.5%
N 42272
 
4.4%
o 12489
 
1.3%
Other values (12) 35071
 
3.6%

major_industry_code
Categorical

High correlation 

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe or children
49403 
Retail trade
8711 
Manufacturing-durable goods
 
4445
Education
 
4227
Manufacturing-nondurable goods
 
3394
Other values (19)
28699 

Length

Max length36
Median length28
Mean length24.316478
Min length7

Characters and Unicode

Total characters2404389
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Manufacturing-durable goods
2nd row Business and repair services
3rd row Not in universe or children
4th row Transportation
5th row Construction

Common Values

ValueCountFrequency (%)
Not in universe or children 49403
50.0%
Retail trade 8711
 
8.8%
Manufacturing-durable goods 4445
 
4.5%
Education 4227
 
4.3%
Manufacturing-nondurable goods 3394
 
3.4%
Construction 3066
 
3.1%
Finance insurance and real estate 3019
 
3.1%
Business and repair services 2985
 
3.0%
Medical except hospital 2304
 
2.3%
Transportation 2211
 
2.2%
Other values (14) 15114
 
15.3%

Length

2025-01-19T18:38:50.092504image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 49403
13.8%
universe 49403
13.8%
or 49403
13.8%
children 49403
13.8%
in 49403
13.8%
services 10782
 
3.0%
trade 10533
 
2.9%
retail 8711
 
2.4%
goods 7839
 
2.2%
and 6676
 
1.9%
Other values (34) 67347
18.8%

Most occurring characters

ValueCountFrequency (%)
358903
14.9%
e 243705
10.1%
i 224203
 
9.3%
n 220082
 
9.2%
r 219335
 
9.1%
o 150196
 
6.2%
t 120058
 
5.0%
s 115646
 
4.8%
a 95172
 
4.0%
c 92888
 
3.9%
Other values (28) 564201
23.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1935840
80.5%
Space Separator 358903
 
14.9%
Uppercase Letter 101807
 
4.2%
Dash Punctuation 7839
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 243705
12.6%
i 224203
11.6%
n 220082
11.4%
r 219335
11.3%
o 150196
7.8%
t 120058
 
6.2%
s 115646
 
6.0%
a 95172
 
4.9%
c 92888
 
4.8%
u 92400
 
4.8%
Other values (11) 362155
18.7%
Uppercase Letter
ValueCountFrequency (%)
N 49403
48.5%
M 10468
 
10.3%
R 8711
 
8.6%
E 5048
 
5.0%
H 4763
 
4.7%
P 4128
 
4.1%
C 3661
 
3.6%
F 3137
 
3.1%
B 2985
 
2.9%
T 2211
 
2.2%
Other values (5) 7292
 
7.2%
Space Separator
ValueCountFrequency (%)
358903
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7839
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2037647
84.7%
Common 366742
 
15.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 243705
12.0%
i 224203
11.0%
n 220082
10.8%
r 219335
10.8%
o 150196
 
7.4%
t 120058
 
5.9%
s 115646
 
5.7%
a 95172
 
4.7%
c 92888
 
4.6%
u 92400
 
4.5%
Other values (26) 463962
22.8%
Common
ValueCountFrequency (%)
358903
97.9%
- 7839
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2404389
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
358903
14.9%
e 243705
10.1%
i 224203
 
9.3%
n 220082
 
9.2%
r 219335
 
9.1%
o 150196
 
6.2%
t 120058
 
5.0%
s 115646
 
4.8%
a 95172
 
4.0%
c 92888
 
3.9%
Other values (28) 564201
23.5%

major_occupation_code
Categorical

High correlation 

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
49403 
Adm support including clerical
7252 
Professional specialty
6869 
Executive admin and managerial
6288 
Other service
6176 
Other values (10)
22891 

Length

Max length38
Median length36
Mean length20.779073
Min length6

Characters and Unicode

Total characters2054614
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Machine operators assmblrs & inspctrs
2nd row Professional specialty
3rd row Not in universe
4th row Executive admin and managerial
5th row Precision production craft & repair

Common Values

ValueCountFrequency (%)
Not in universe 49403
50.0%
Adm support including clerical 7252
 
7.3%
Professional specialty 6869
 
6.9%
Executive admin and managerial 6288
 
6.4%
Other service 6176
 
6.2%
Sales 6021
 
6.1%
Precision production craft & repair 5354
 
5.4%
Machine operators assmblrs & inspctrs 3186
 
3.2%
Handlers equip cleaners etc 2070
 
2.1%
Transportation and material moving 2040
 
2.1%
Other values (5) 4220
 
4.3%

Length

2025-01-19T18:38:50.279504image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 49403
16.0%
in 49403
16.0%
universe 49403
16.0%
and 11320
 
3.7%
support 8724
 
2.8%
8540
 
2.8%
clerical 7252
 
2.4%
adm 7252
 
2.4%
including 7252
 
2.4%
professional 6869
 
2.2%
Other values (33) 103051
33.4%

Most occurring characters

ValueCountFrequency (%)
310539
15.1%
i 205246
10.0%
e 203495
9.9%
n 177399
 
8.6%
r 149150
 
7.3%
s 128958
 
6.3%
t 107721
 
5.2%
o 103592
 
5.0%
a 100938
 
4.9%
u 79516
 
3.9%
Other values (24) 488060
23.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1636640
79.7%
Space Separator 310539
 
15.1%
Uppercase Letter 98895
 
4.8%
Other Punctuation 8540
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 205246
12.5%
e 203495
12.4%
n 177399
10.8%
r 149150
9.1%
s 128958
7.9%
t 107721
 
6.6%
o 103592
 
6.3%
a 100938
 
6.2%
u 79516
 
4.9%
c 72622
 
4.4%
Other values (12) 308003
18.8%
Uppercase Letter
ValueCountFrequency (%)
N 49403
50.0%
P 13435
 
13.6%
A 7268
 
7.3%
E 6288
 
6.4%
O 6176
 
6.2%
S 6021
 
6.1%
T 3512
 
3.6%
M 3186
 
3.2%
H 2070
 
2.1%
F 1536
 
1.6%
Space Separator
ValueCountFrequency (%)
310539
100.0%
Other Punctuation
ValueCountFrequency (%)
& 8540
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1735535
84.5%
Common 319079
 
15.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 205246
11.8%
e 203495
11.7%
n 177399
10.2%
r 149150
 
8.6%
s 128958
 
7.4%
t 107721
 
6.2%
o 103592
 
6.0%
a 100938
 
5.8%
u 79516
 
4.6%
c 72622
 
4.2%
Other values (22) 406898
23.4%
Common
ValueCountFrequency (%)
310539
97.3%
& 8540
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2054614
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
310539
15.1%
i 205246
10.0%
e 203495
9.9%
n 177399
 
8.6%
r 149150
 
7.3%
s 128958
 
6.3%
t 107721
 
5.2%
o 103592
 
5.0%
a 100938
 
4.9%
u 79516
 
3.9%
Other values (24) 488060
23.8%

race
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
White
82805 
Black
10061 
Asian or Pacific Islander
 
2909
Other
 
1899
Amer Indian Aleut or Eskimo
 
1205

Length

Max length28
Median length6
Mean length6.8565014
Min length6

Characters and Unicode

Total characters677964
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row White
2nd row White
3rd row White
4th row White
5th row White

Common Values

ValueCountFrequency (%)
White 82805
83.7%
Black 10061
 
10.2%
Asian or Pacific Islander 2909
 
2.9%
Other 1899
 
1.9%
Amer Indian Aleut or Eskimo 1205
 
1.2%

Length

2025-01-19T18:38:50.419534image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:50.544533image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
white 82805
73.7%
black 10061
 
8.9%
or 4114
 
3.7%
asian 2909
 
2.6%
pacific 2909
 
2.6%
islander 2909
 
2.6%
other 1899
 
1.7%
amer 1205
 
1.1%
indian 1205
 
1.1%
aleut 1205
 
1.1%

Most occurring characters

ValueCountFrequency (%)
112426
16.6%
i 93942
13.9%
e 90023
13.3%
t 85909
12.7%
h 84704
12.5%
W 82805
12.2%
a 19993
 
2.9%
c 15879
 
2.3%
l 14175
 
2.1%
k 11266
 
1.7%
Other values (14) 66842
9.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 457226
67.4%
Space Separator 112426
 
16.6%
Uppercase Letter 108312
 
16.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 93942
20.5%
e 90023
19.7%
t 85909
18.8%
h 84704
18.5%
a 19993
 
4.4%
c 15879
 
3.5%
l 14175
 
3.1%
k 11266
 
2.5%
r 10127
 
2.2%
n 8228
 
1.8%
Other values (6) 22980
 
5.0%
Uppercase Letter
ValueCountFrequency (%)
W 82805
76.5%
B 10061
 
9.3%
A 5319
 
4.9%
I 4114
 
3.8%
P 2909
 
2.7%
O 1899
 
1.8%
E 1205
 
1.1%
Space Separator
ValueCountFrequency (%)
112426
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 565538
83.4%
Common 112426
 
16.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 93942
16.6%
e 90023
15.9%
t 85909
15.2%
h 84704
15.0%
W 82805
14.6%
a 19993
 
3.5%
c 15879
 
2.8%
l 14175
 
2.5%
k 11266
 
2.0%
r 10127
 
1.8%
Other values (13) 56715
10.0%
Common
ValueCountFrequency (%)
112426
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 677964
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
112426
16.6%
i 93942
13.9%
e 90023
13.3%
t 85909
12.7%
h 84704
12.5%
W 82805
12.2%
a 19993
 
2.9%
c 15879
 
2.3%
l 14175
 
2.1%
k 11266
 
1.7%
Other values (14) 66842
9.9%

hispanic_origin
Categorical

High correlation  Imbalance 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
All other
85084 
Mexican-American
 
3981
Mexican (Mexicano)
 
3686
Central or South American
 
1985
Puerto Rican
 
1578
Other values (5)
 
2565

Length

Max length26
Median length10
Mean length10.982989
Min length3

Characters and Unicode

Total characters1085987
Distinct characters31
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Mexican (Mexicano)
2nd row All other
3rd row Mexican-American
4th row All other
5th row All other

Common Values

ValueCountFrequency (%)
All other 85084
86.0%
Mexican-American 3981
 
4.0%
Mexican (Mexicano) 3686
 
3.7%
Central or South American 1985
 
2.0%
Puerto Rican 1578
 
1.6%
Other Spanish 1243
 
1.3%
Cuban 613
 
0.6%
NA 400
 
0.4%
Chicano 169
 
0.2%
Do not know 140
 
0.1%

Length

2025-01-19T18:38:50.699546image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:50.836507image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
other 86327
43.9%
all 85084
43.3%
mexican-american 3981
 
2.0%
mexican 3686
 
1.9%
mexicano 3686
 
1.9%
central 1985
 
1.0%
or 1985
 
1.0%
south 1985
 
1.0%
american 1985
 
1.0%
rican 1578
 
0.8%
Other values (8) 4423
 
2.2%

Most occurring characters

ValueCountFrequency (%)
196705
18.1%
l 172153
15.9%
e 107209
9.9%
r 97841
9.0%
o 94907
8.7%
t 92015
8.5%
A 91450
8.4%
h 89724
8.3%
n 23187
 
2.1%
a 22907
 
2.1%
Other values (21) 97889
9.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 764192
70.4%
Space Separator 196705
 
18.1%
Uppercase Letter 113737
 
10.5%
Dash Punctuation 3981
 
0.4%
Open Punctuation 3686
 
0.3%
Close Punctuation 3686
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 172153
22.5%
e 107209
14.0%
r 97841
12.8%
o 94907
12.4%
t 92015
12.0%
h 89724
11.7%
n 23187
 
3.0%
a 22907
 
3.0%
i 20309
 
2.7%
c 19066
 
2.5%
Other values (8) 24874
 
3.3%
Uppercase Letter
ValueCountFrequency (%)
A 91450
80.4%
M 11353
 
10.0%
S 3228
 
2.8%
C 2767
 
2.4%
P 1578
 
1.4%
R 1578
 
1.4%
O 1243
 
1.1%
N 400
 
0.4%
D 140
 
0.1%
Space Separator
ValueCountFrequency (%)
196705
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3981
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3686
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3686
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 877929
80.8%
Common 208058
 
19.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 172153
19.6%
e 107209
12.2%
r 97841
11.1%
o 94907
10.8%
t 92015
10.5%
A 91450
10.4%
h 89724
10.2%
n 23187
 
2.6%
a 22907
 
2.6%
i 20309
 
2.3%
Other values (17) 66227
 
7.5%
Common
ValueCountFrequency (%)
196705
94.5%
- 3981
 
1.9%
( 3686
 
1.8%
) 3686
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1085987
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
196705
18.1%
l 172153
15.9%
e 107209
9.9%
r 97841
9.0%
o 94907
8.7%
t 92015
8.5%
A 91450
8.4%
h 89724
8.3%
n 23187
 
2.1%
a 22907
 
2.1%
Other values (21) 97889
9.0%

sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Female
51361 
Male
47518 

Length

Max length7
Median length7
Mean length6.0388657
Min length5

Characters and Unicode

Total characters597117
Distinct characters7
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Female
2nd row Female
3rd row Male
4th row Female
5th row Male

Common Values

ValueCountFrequency (%)
Female 51361
51.9%
Male 47518
48.1%

Length

2025-01-19T18:38:51.019534image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:51.174025image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
female 51361
51.9%
male 47518
48.1%

Most occurring characters

ValueCountFrequency (%)
e 150240
25.2%
98879
16.6%
a 98879
16.6%
l 98879
16.6%
F 51361
 
8.6%
m 51361
 
8.6%
M 47518
 
8.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 399359
66.9%
Space Separator 98879
 
16.6%
Uppercase Letter 98879
 
16.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 150240
37.6%
a 98879
24.8%
l 98879
24.8%
m 51361
 
12.9%
Uppercase Letter
ValueCountFrequency (%)
F 51361
51.9%
M 47518
48.1%
Space Separator
ValueCountFrequency (%)
98879
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 498238
83.4%
Common 98879
 
16.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 150240
30.2%
a 98879
19.8%
l 98879
19.8%
F 51361
 
10.3%
m 51361
 
10.3%
M 47518
 
9.5%
Common
ValueCountFrequency (%)
98879
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 597117
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 150240
25.2%
98879
16.6%
a 98879
16.6%
l 98879
16.6%
F 51361
 
8.6%
m 51361
 
8.6%
M 47518
 
8.0%

member_of_a_labor_union
Categorical

Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
89400 
No
 
8034
Yes
 
1445

Length

Max length16
Median length16
Mean length14.768373
Min length3

Characters and Unicode

Total characters1460282
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Not in universe
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 89400
90.4%
No 8034
 
8.1%
Yes 1445
 
1.5%

Length

2025-01-19T18:38:51.299041image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:51.415037image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 89400
32.2%
in 89400
32.2%
universe 89400
32.2%
no 8034
 
2.9%
yes 1445
 
0.5%

Most occurring characters

ValueCountFrequency (%)
277679
19.0%
e 180245
12.3%
i 178800
12.2%
n 178800
12.2%
N 97434
 
6.7%
o 97434
 
6.7%
s 90845
 
6.2%
t 89400
 
6.1%
u 89400
 
6.1%
v 89400
 
6.1%
Other values (2) 90845
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1083724
74.2%
Space Separator 277679
 
19.0%
Uppercase Letter 98879
 
6.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 180245
16.6%
i 178800
16.5%
n 178800
16.5%
o 97434
9.0%
s 90845
8.4%
t 89400
8.2%
u 89400
8.2%
v 89400
8.2%
r 89400
8.2%
Uppercase Letter
ValueCountFrequency (%)
N 97434
98.5%
Y 1445
 
1.5%
Space Separator
ValueCountFrequency (%)
277679
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1182603
81.0%
Common 277679
 
19.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 180245
15.2%
i 178800
15.1%
n 178800
15.1%
N 97434
8.2%
o 97434
8.2%
s 90845
7.7%
t 89400
7.6%
u 89400
7.6%
v 89400
7.6%
r 89400
7.6%
Common
ValueCountFrequency (%)
277679
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1460282
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
277679
19.0%
e 180245
12.3%
i 178800
12.2%
n 178800
12.2%
N 97434
 
6.7%
o 97434
 
6.7%
s 90845
 
6.2%
t 89400
 
6.1%
u 89400
 
6.1%
v 89400
 
6.1%
Other values (2) 90845
 
6.2%

reason_for_unemployment
Categorical

Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
95749 
Job loser
 
1616
Re-entrant
 
1024
Job leaver
 
286
New entrant
 
204

Length

Max length15
Median length15
Mean length14.827446
Min length9

Characters and Unicode

Total characters1466123
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowNot in universe
3rd rowNot in universe
4th rowNot in universe
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 95749
96.8%
Job loser 1616
 
1.6%
Re-entrant 1024
 
1.0%
Job leaver 286
 
0.3%
New entrant 204
 
0.2%

Length

2025-01-19T18:38:51.552233image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:51.710829image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 95749
32.7%
in 95749
32.7%
universe 95749
32.7%
job 1902
 
0.7%
loser 1616
 
0.6%
re-entrant 1024
 
0.4%
leaver 286
 
0.1%
new 204
 
0.1%
entrant 204
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 196142
13.4%
n 193954
13.2%
193604
13.2%
i 191498
13.1%
o 99267
6.8%
r 98879
6.7%
t 98205
6.7%
s 97365
6.6%
v 96035
6.6%
N 95953
6.5%
Other values (8) 105221
7.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1172616
80.0%
Space Separator 193604
 
13.2%
Uppercase Letter 98879
 
6.7%
Dash Punctuation 1024
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 196142
16.7%
n 193954
16.5%
i 191498
16.3%
o 99267
8.5%
r 98879
8.4%
t 98205
8.4%
s 97365
8.3%
v 96035
8.2%
u 95749
8.2%
b 1902
 
0.2%
Other values (3) 3620
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
N 95953
97.0%
J 1902
 
1.9%
R 1024
 
1.0%
Space Separator
ValueCountFrequency (%)
193604
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1024
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1271495
86.7%
Common 194628
 
13.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 196142
15.4%
n 193954
15.3%
i 191498
15.1%
o 99267
7.8%
r 98879
7.8%
t 98205
7.7%
s 97365
7.7%
v 96035
7.6%
N 95953
7.5%
u 95749
7.5%
Other values (6) 8448
 
0.7%
Common
ValueCountFrequency (%)
193604
99.5%
- 1024
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1466123
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 196142
13.4%
n 193954
13.2%
193604
13.2%
i 191498
13.1%
o 99267
6.8%
r 98879
6.7%
t 98205
6.7%
s 97365
6.6%
v 96035
6.6%
N 95953
6.5%
Other values (8) 105221
7.2%

full_or_part_time_employment_stat
Categorical

High correlation 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Children or Armed Forces
60832 
FTE
21670 
Not Employed
13390 
PTE
 
2987

Length

Max length24
Median length24
Mean length17.13831
Min length3

Characters and Unicode

Total characters1694619
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFTE
2nd rowPTE
3rd rowChildren or Armed Forces
4th rowChildren or Armed Forces
5th rowFTE

Common Values

ValueCountFrequency (%)
Children or Armed Forces 60832
61.5%
FTE 21670
 
21.9%
Not Employed 13390
 
13.5%
PTE 2987
 
3.0%

Length

2025-01-19T18:38:51.847855image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:51.965868image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
children 60832
20.6%
or 60832
20.6%
armed 60832
20.6%
forces 60832
20.6%
fte 21670
 
7.4%
not 13390
 
4.5%
employed 13390
 
4.5%
pte 2987
 
1.0%

Most occurring characters

ValueCountFrequency (%)
r 243328
14.4%
e 195886
11.6%
195886
11.6%
o 148444
 
8.8%
d 135054
 
8.0%
F 82502
 
4.9%
m 74222
 
4.4%
l 74222
 
4.4%
h 60832
 
3.6%
s 60832
 
3.6%
Other values (12) 423411
25.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1215486
71.7%
Uppercase Letter 283247
 
16.7%
Space Separator 195886
 
11.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 243328
20.0%
e 195886
16.1%
o 148444
12.2%
d 135054
11.1%
m 74222
 
6.1%
l 74222
 
6.1%
h 60832
 
5.0%
s 60832
 
5.0%
c 60832
 
5.0%
n 60832
 
5.0%
Other values (4) 101002
8.3%
Uppercase Letter
ValueCountFrequency (%)
F 82502
29.1%
C 60832
21.5%
A 60832
21.5%
E 38047
13.4%
T 24657
 
8.7%
N 13390
 
4.7%
P 2987
 
1.1%
Space Separator
ValueCountFrequency (%)
195886
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1498733
88.4%
Common 195886
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 243328
16.2%
e 195886
13.1%
o 148444
9.9%
d 135054
 
9.0%
F 82502
 
5.5%
m 74222
 
5.0%
l 74222
 
5.0%
h 60832
 
4.1%
s 60832
 
4.1%
c 60832
 
4.1%
Other values (11) 362579
24.2%
Common
ValueCountFrequency (%)
195886
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1694619
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 243328
14.4%
e 195886
11.6%
195886
11.6%
o 148444
 
8.8%
d 135054
 
8.0%
F 82502
 
4.9%
m 74222
 
4.4%
l 74222
 
4.4%
h 60832
 
3.6%
s 60832
 
3.6%
Other values (12) 423411
25.0%

capital_gains
Real number (ℝ)

Zeros 

Distinct123
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean429.59091
Minimum0
Maximum99999
Zeros95157
Zeros (%)96.2%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:52.116770image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4637.1881
Coefficient of variation (CV)10.794428
Kurtosis402.82376
Mean429.59091
Median Absolute Deviation (MAD)0
Skewness19.209214
Sum42477520
Variance21503513
MonotonicityNot monotonic
2025-01-19T18:38:52.282084image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 95157
96.2%
15024 380
 
0.4%
7298 289
 
0.3%
7688 285
 
0.3%
99999 188
 
0.2%
5178 106
 
0.1%
4386 103
 
0.1%
3103 101
 
0.1%
5013 84
 
0.1%
10520 71
 
0.1%
Other values (113) 2115
 
2.1%
ValueCountFrequency (%)
0 95157
96.2%
114 9
 
< 0.1%
401 22
 
< 0.1%
594 45
 
< 0.1%
914 7
 
< 0.1%
991 27
 
< 0.1%
1055 43
 
< 0.1%
1086 61
 
0.1%
1111 8
 
< 0.1%
1140 2
 
< 0.1%
ValueCountFrequency (%)
99999 188
0.2%
41310 3
 
< 0.1%
34095 2
 
< 0.1%
27828 44
 
< 0.1%
25236 9
 
< 0.1%
25124 13
 
< 0.1%
20051 36
 
< 0.1%
18481 8
 
< 0.1%
15831 8
 
< 0.1%
15024 380
0.4%

capital_losses
Real number (ℝ)

Zeros 

Distinct111
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.240223
Minimum0
Maximum4608
Zeros96971
Zeros (%)98.1%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:52.444685image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4608
Range4608
Interquartile range (IQR)0

Descriptive statistics

Standard deviation266.68642
Coefficient of variation (CV)7.3588515
Kurtosis64.450436
Mean36.240223
Median Absolute Deviation (MAD)0
Skewness7.7596025
Sum3583397
Variance71121.646
MonotonicityNot monotonic
2025-01-19T18:38:52.617742image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 96971
98.1%
1902 213
 
0.2%
1977 195
 
0.2%
1887 168
 
0.2%
1602 106
 
0.1%
1485 59
 
0.1%
1848 54
 
0.1%
1740 52
 
0.1%
2415 45
 
< 0.1%
1590 40
 
< 0.1%
Other values (101) 976
 
1.0%
ValueCountFrequency (%)
0 96971
98.1%
155 2
 
< 0.1%
213 6
 
< 0.1%
323 2
 
< 0.1%
419 18
 
< 0.1%
625 11
 
< 0.1%
653 5
 
< 0.1%
772 3
 
< 0.1%
810 5
 
< 0.1%
880 4
 
< 0.1%
ValueCountFrequency (%)
4608 4
 
< 0.1%
4356 15
< 0.1%
3900 1
 
< 0.1%
3770 3
 
< 0.1%
3683 2
 
< 0.1%
3500 1
 
< 0.1%
3175 2
 
< 0.1%
3004 6
 
< 0.1%
2824 15
< 0.1%
2788 4
 
< 0.1%

dividends_from_stocks
Real number (ℝ)

Skewed  Zeros 

Distinct1140
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean194.21373
Minimum0
Maximum99999
Zeros88351
Zeros (%)89.4%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:52.787726image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile400
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1849.8435
Coefficient of variation (CV)9.5247824
Kurtosis940.09239
Mean194.21373
Median Absolute Deviation (MAD)0
Skewness25.258473
Sum19203659
Variance3421920.9
MonotonicityNot monotonic
2025-01-19T18:38:52.966768image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 88351
89.4%
100 571
 
0.6%
500 506
 
0.5%
200 481
 
0.5%
1000 474
 
0.5%
50 402
 
0.4%
250 281
 
0.3%
300 271
 
0.3%
150 262
 
0.3%
2000 246
 
0.2%
Other values (1130) 7034
 
7.1%
ValueCountFrequency (%)
0 88351
89.4%
1 233
 
0.2%
2 97
 
0.1%
3 53
 
0.1%
4 37
 
< 0.1%
5 84
 
0.1%
6 48
 
< 0.1%
7 31
 
< 0.1%
8 41
 
< 0.1%
9 24
 
< 0.1%
ValueCountFrequency (%)
99999 6
< 0.1%
90000 1
 
< 0.1%
81000 1
 
< 0.1%
75000 3
 
< 0.1%
60000 4
< 0.1%
57678 1
 
< 0.1%
55000 1
 
< 0.1%
51000 1
 
< 0.1%
50110 1
 
< 0.1%
50000 9
< 0.1%

tax_filer_stat
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Joint Filer
39734 
Non-Filer
36496 
Individual Filer
22649 

Length

Max length16
Median length11
Mean length11.407094
Min length9

Characters and Unicode

Total characters1127922
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJoint Filer
2nd rowJoint Filer
3rd rowNon-Filer
4th rowIndividual Filer
5th rowIndividual Filer

Common Values

ValueCountFrequency (%)
Joint Filer 39734
40.2%
Non-Filer 36496
36.9%
Individual Filer 22649
22.9%

Length

2025-01-19T18:38:53.124738image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:53.242755image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
filer 62383
38.7%
joint 39734
24.6%
non-filer 36496
22.6%
individual 22649
 
14.0%

Most occurring characters

ValueCountFrequency (%)
i 183911
16.3%
l 121528
10.8%
e 98879
8.8%
r 98879
8.8%
n 98879
8.8%
F 98879
8.8%
o 76230
 
6.8%
62383
 
5.5%
d 45298
 
4.0%
J 39734
 
3.5%
Other values (7) 203322
18.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 831285
73.7%
Uppercase Letter 197758
 
17.5%
Space Separator 62383
 
5.5%
Dash Punctuation 36496
 
3.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 183911
22.1%
l 121528
14.6%
e 98879
11.9%
r 98879
11.9%
n 98879
11.9%
o 76230
9.2%
d 45298
 
5.4%
t 39734
 
4.8%
v 22649
 
2.7%
u 22649
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
F 98879
50.0%
J 39734
20.1%
N 36496
 
18.5%
I 22649
 
11.5%
Space Separator
ValueCountFrequency (%)
62383
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 36496
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1029043
91.2%
Common 98879
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 183911
17.9%
l 121528
11.8%
e 98879
9.6%
r 98879
9.6%
n 98879
9.6%
F 98879
9.6%
o 76230
7.4%
d 45298
 
4.4%
J 39734
 
3.9%
t 39734
 
3.9%
Other values (5) 127092
12.4%
Common
ValueCountFrequency (%)
62383
63.1%
- 36496
36.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1127922
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 183911
16.3%
l 121528
10.8%
e 98879
8.8%
r 98879
8.8%
n 98879
8.8%
F 98879
8.8%
o 76230
 
6.8%
62383
 
5.5%
d 45298
 
4.0%
J 39734
 
3.5%
Other values (7) 203322
18.0%

region_of_previous_residence
Categorical

High correlation  Imbalance 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
91198 
South
 
2422
West
 
2041
Midwest
 
1694
Northeast
 
1322

Length

Max length16
Median length16
Mean length15.292337
Min length5

Characters and Unicode

Total characters1512091
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Not in universe
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 91198
92.2%
South 2422
 
2.4%
West 2041
 
2.1%
Midwest 1694
 
1.7%
Northeast 1322
 
1.3%
Abroad 202
 
0.2%

Length

2025-01-19T18:38:53.377004image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:53.499004image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 91198
32.4%
in 91198
32.4%
universe 91198
32.4%
south 2422
 
0.9%
west 2041
 
0.7%
midwest 1694
 
0.6%
northeast 1322
 
0.5%
abroad 202
 
0.1%

Most occurring characters

ValueCountFrequency (%)
281275
18.6%
e 187453
12.4%
i 184090
12.2%
n 182396
12.1%
t 99999
 
6.6%
s 96255
 
6.4%
o 95144
 
6.3%
u 93620
 
6.2%
r 92722
 
6.1%
N 92520
 
6.1%
Other values (10) 106617
 
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1131937
74.9%
Space Separator 281275
 
18.6%
Uppercase Letter 98879
 
6.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 187453
16.6%
i 184090
16.3%
n 182396
16.1%
t 99999
8.8%
s 96255
8.5%
o 95144
8.4%
u 93620
8.3%
r 92722
8.2%
v 91198
8.1%
h 3744
 
0.3%
Other values (4) 5316
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
N 92520
93.6%
S 2422
 
2.4%
W 2041
 
2.1%
M 1694
 
1.7%
A 202
 
0.2%
Space Separator
ValueCountFrequency (%)
281275
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1230816
81.4%
Common 281275
 
18.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 187453
15.2%
i 184090
15.0%
n 182396
14.8%
t 99999
8.1%
s 96255
7.8%
o 95144
7.7%
u 93620
7.6%
r 92722
7.5%
N 92520
7.5%
v 91198
7.4%
Other values (9) 15419
 
1.3%
Common
ValueCountFrequency (%)
281275
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1512091
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
281275
18.6%
e 187453
12.4%
i 184090
12.2%
n 182396
12.1%
t 99999
 
6.6%
s 96255
 
6.4%
o 95144
 
6.3%
u 93620
 
6.2%
r 92722
 
6.1%
N 92520
 
6.1%
Other values (10) 106617
 
7.1%
Distinct51
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:53.673030image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length21
Median length16
Mean length15.47016
Min length2

Characters and Unicode

Total characters1529674
Distinct characters46
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Not in universe
4th row Not in universe
5th row Not in universe
ValueCountFrequency (%)
not 91198
32.3%
universe 91198
32.3%
in 91198
32.3%
california 881
 
0.3%
north 623
 
0.2%
utah 533
 
0.2%
new 489
 
0.2%
florida 450
 
0.2%
carolina 447
 
0.2%
330
 
0.1%
Other values (46) 5379
 
1.9%
2025-01-19T18:38:54.034397image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
282726
18.5%
i 188701
12.3%
n 187296
12.2%
e 185188
12.1%
o 96908
 
6.3%
r 95309
 
6.2%
t 93880
 
6.1%
s 93824
 
6.1%
N 92491
 
6.0%
u 91818
 
6.0%
Other values (36) 121533
7.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1146674
75.0%
Space Separator 282726
 
18.5%
Uppercase Letter 99944
 
6.5%
Other Punctuation 330
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 188701
16.5%
n 187296
16.3%
e 185188
16.2%
o 96908
8.5%
r 95309
8.3%
t 93880
8.2%
s 93824
8.2%
u 91818
8.0%
v 91377
8.0%
a 9436
 
0.8%
Other values (14) 12937
 
1.1%
Uppercase Letter
ValueCountFrequency (%)
N 92491
92.5%
C 1563
 
1.6%
M 1192
 
1.2%
A 732
 
0.7%
U 533
 
0.5%
O 504
 
0.5%
I 461
 
0.5%
F 450
 
0.5%
D 404
 
0.4%
W 267
 
0.3%
Other values (10) 1347
 
1.3%
Space Separator
ValueCountFrequency (%)
282726
100.0%
Other Punctuation
ValueCountFrequency (%)
? 330
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1246618
81.5%
Common 283056
 
18.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 188701
15.1%
n 187296
15.0%
e 185188
14.9%
o 96908
7.8%
r 95309
7.6%
t 93880
7.5%
s 93824
7.5%
N 92491
7.4%
u 91818
7.4%
v 91377
7.3%
Other values (34) 29826
 
2.4%
Common
ValueCountFrequency (%)
282726
99.9%
? 330
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1529674
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
282726
18.5%
i 188701
12.3%
n 187296
12.2%
e 185188
12.1%
o 96908
 
6.3%
r 95309
 
6.2%
t 93880
 
6.1%
s 93824
 
6.1%
N 92491
 
6.0%
u 91818
 
6.0%
Other values (36) 121533
7.9%

detailed_household_and_family_stat
Categorical

High correlation 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Primary Householder
58576 
Child
31896 
Extended Family
 
4810
Other
 
3597

Length

Max length19
Median length19
Mean length13.780065
Min length5

Characters and Unicode

Total characters1362559
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPrimary Householder
2nd rowPrimary Householder
3rd rowChild
4th rowPrimary Householder
5th rowOther

Common Values

ValueCountFrequency (%)
Primary Householder 58576
59.2%
Child 31896
32.3%
Extended Family 4810
 
4.9%
Other 3597
 
3.6%

Length

2025-01-19T18:38:54.186644image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:54.308340image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
primary 58576
36.1%
householder 58576
36.1%
child 31896
19.7%
extended 4810
 
3.0%
family 4810
 
3.0%
other 3597
 
2.2%

Most occurring characters

ValueCountFrequency (%)
r 179325
13.2%
e 130369
 
9.6%
o 117152
 
8.6%
d 100092
 
7.3%
i 95282
 
7.0%
l 95282
 
7.0%
h 94069
 
6.9%
m 63386
 
4.7%
a 63386
 
4.7%
y 63386
 
4.7%
Other values (12) 360830
26.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1136908
83.4%
Uppercase Letter 162265
 
11.9%
Space Separator 63386
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 179325
15.8%
e 130369
11.5%
o 117152
10.3%
d 100092
8.8%
i 95282
8.4%
l 95282
8.4%
h 94069
8.3%
m 63386
 
5.6%
a 63386
 
5.6%
y 63386
 
5.6%
Other values (5) 135179
11.9%
Uppercase Letter
ValueCountFrequency (%)
P 58576
36.1%
H 58576
36.1%
C 31896
19.7%
E 4810
 
3.0%
F 4810
 
3.0%
O 3597
 
2.2%
Space Separator
ValueCountFrequency (%)
63386
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1299173
95.3%
Common 63386
 
4.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 179325
13.8%
e 130369
10.0%
o 117152
 
9.0%
d 100092
 
7.7%
i 95282
 
7.3%
l 95282
 
7.3%
h 94069
 
7.2%
m 63386
 
4.9%
a 63386
 
4.9%
y 63386
 
4.9%
Other values (11) 297444
22.9%
Common
ValueCountFrequency (%)
63386
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1362559
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 179325
13.2%
e 130369
 
9.6%
o 117152
 
8.6%
d 100092
 
7.3%
i 95282
 
7.0%
l 95282
 
7.0%
h 94069
 
6.9%
m 63386
 
4.7%
a 63386
 
4.7%
y 63386
 
4.7%
Other values (12) 360830
26.5%

detailed_household_summary_in_household
Categorical

High correlation 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Householder
37939 
Child under 18 never married
24195 
Spouse of householder
20648 
Child 18 or older
7337 
Other relative of householder
4813 
Other values (3)
3947 

Length

Max length37
Median length30
Mean length20.173808
Min length12

Characters and Unicode

Total characters1994766
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Spouse of householder
2nd row Spouse of householder
3rd row Child under 18 never married
4th row Householder
5th row Nonrelative of householder

Common Values

ValueCountFrequency (%)
Householder 37939
38.4%
Child under 18 never married 24195
24.5%
Spouse of householder 20648
20.9%
Child 18 or older 7337
 
7.4%
Other relative of householder 4813
 
4.9%
Nonrelative of householder 3871
 
3.9%
Group Quarters- Secondary individual 54
 
0.1%
Child under 18 ever married 22
 
< 0.1%

Length

2025-01-19T18:38:54.451371image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:54.592355image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
householder 67271
23.9%
child 31554
11.2%
18 31554
11.2%
of 29332
10.4%
under 24217
 
8.6%
married 24217
 
8.6%
never 24195
 
8.6%
spouse 20648
 
7.3%
older 7337
 
2.6%
or 7337
 
2.6%
Other values (8) 13735
 
4.9%

Most occurring characters

ValueCountFrequency (%)
e 281684
14.1%
281397
14.1%
o 203175
10.2%
r 192526
9.7%
d 154758
7.8%
h 132970
 
6.7%
l 114900
 
5.8%
u 112298
 
5.6%
s 87973
 
4.4%
i 64617
 
3.2%
Other values (19) 368468
18.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1551220
77.8%
Space Separator 281397
 
14.1%
Uppercase Letter 98987
 
5.0%
Decimal Number 63108
 
3.2%
Dash Punctuation 54
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 281684
18.2%
o 203175
13.1%
r 192526
12.4%
d 154758
10.0%
h 132970
8.6%
l 114900
7.4%
u 112298
 
7.2%
s 87973
 
5.7%
i 64617
 
4.2%
n 52391
 
3.4%
Other values (8) 153928
9.9%
Uppercase Letter
ValueCountFrequency (%)
H 37939
38.3%
C 31554
31.9%
S 20702
20.9%
O 4813
 
4.9%
N 3871
 
3.9%
G 54
 
0.1%
Q 54
 
0.1%
Decimal Number
ValueCountFrequency (%)
8 31554
50.0%
1 31554
50.0%
Space Separator
ValueCountFrequency (%)
281397
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 54
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1650207
82.7%
Common 344559
 
17.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 281684
17.1%
o 203175
12.3%
r 192526
11.7%
d 154758
9.4%
h 132970
8.1%
l 114900
7.0%
u 112298
 
6.8%
s 87973
 
5.3%
i 64617
 
3.9%
n 52391
 
3.2%
Other values (15) 252915
15.3%
Common
ValueCountFrequency (%)
281397
81.7%
8 31554
 
9.2%
1 31554
 
9.2%
- 54
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1994766
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 281684
14.1%
281397
14.1%
o 203175
10.2%
r 192526
9.7%
d 154758
7.8%
h 132970
 
6.7%
l 114900
 
5.8%
u 112298
 
5.6%
s 87973
 
4.4%
i 64617
 
3.2%
Other values (19) 368468
18.5%

instance_weight
Real number (ℝ)

Distinct64741
Distinct (%)65.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1741.2191
Minimum43.26
Maximum16258.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:54.774369image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum43.26
5-th percentile393.667
Q11058.14
median1616.89
Q32190.47
95-th percentile3592.968
Maximum16258.2
Range16214.94
Interquartile range (IQR)1132.33

Descriptive statistics

Standard deviation996.25211
Coefficient of variation (CV)0.5721578
Kurtosis5.5538814
Mean1741.2191
Median Absolute Deviation (MAD)563.4
Skewness1.4480311
Sum1.7217001 × 108
Variance992518.27
MonotonicityNot monotonic
2025-01-19T18:38:54.973369image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
707.9 22
 
< 0.1%
1215.77 19
 
< 0.1%
1362.16 19
 
< 0.1%
1378.71 19
 
< 0.1%
1831.77 16
 
< 0.1%
1291.46 15
 
< 0.1%
2013.21 15
 
< 0.1%
1386.38 15
 
< 0.1%
1280.35 15
 
< 0.1%
2228.01 15
 
< 0.1%
Other values (64731) 98709
99.8%
ValueCountFrequency (%)
43.26 1
< 0.1%
47.83 1
< 0.1%
50.38 1
< 0.1%
50.46 1
< 0.1%
52.43 1
< 0.1%
53.7 2
< 0.1%
54.88 1
< 0.1%
56.45 1
< 0.1%
58.55 1
< 0.1%
58.65 1
< 0.1%
ValueCountFrequency (%)
16258.2 1
< 0.1%
14547.9 1
< 0.1%
13388.6 1
< 0.1%
13145.1 1
< 0.1%
12960.2 1
< 0.1%
12739.2 1
< 0.1%
12554.3 1
< 0.1%
11688.2 1
< 0.1%
11627.5 1
< 0.1%
11254.2 2
< 0.1%

migration_code_change_in_msa
Categorical

High correlation 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
50354 
No movement
41044 
MSA movement
5280 
Non-MSA movement
 
1342
Mixed movement
 
658

Length

Max length16
Median length15
Mean length13.182283
Min length11

Characters and Unicode

Total characters1303451
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowNot in universe
3rd rowNot in universe
4th rowNo movement
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 50354
50.9%
No movement 41044
41.5%
MSA movement 5280
 
5.3%
Non-MSA movement 1342
 
1.4%
Mixed movement 658
 
0.7%
International 201
 
0.2%

Length

2025-01-19T18:38:55.152470image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:55.282486image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 50354
20.3%
in 50354
20.3%
universe 50354
20.3%
movement 48324
19.5%
no 41044
16.6%
msa 5280
 
2.1%
non-msa 1342
 
0.5%
mixed 658
 
0.3%
international 201
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 198215
15.2%
n 150977
11.6%
149032
11.4%
o 141265
10.8%
i 101567
7.8%
t 99080
7.6%
v 98678
7.6%
m 96648
7.4%
N 92740
7.1%
r 50555
 
3.9%
Other values (11) 124694
9.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1039612
79.8%
Space Separator 149032
 
11.4%
Uppercase Letter 113465
 
8.7%
Dash Punctuation 1342
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 198215
19.1%
n 150977
14.5%
o 141265
13.6%
i 101567
9.8%
t 99080
9.5%
v 98678
9.5%
m 96648
9.3%
r 50555
 
4.9%
s 50354
 
4.8%
u 50354
 
4.8%
Other values (4) 1919
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
N 92740
81.7%
M 7280
 
6.4%
S 6622
 
5.8%
A 6622
 
5.8%
I 201
 
0.2%
Space Separator
ValueCountFrequency (%)
149032
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1342
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1153077
88.5%
Common 150374
 
11.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 198215
17.2%
n 150977
13.1%
o 141265
12.3%
i 101567
8.8%
t 99080
8.6%
v 98678
8.6%
m 96648
8.4%
N 92740
8.0%
r 50555
 
4.4%
s 50354
 
4.4%
Other values (9) 72998
 
6.3%
Common
ValueCountFrequency (%)
149032
99.1%
- 1342
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1303451
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 198215
15.2%
n 150977
11.6%
149032
11.4%
o 141265
10.8%
i 101567
7.8%
t 99080
7.6%
v 98678
7.6%
m 96648
7.4%
N 92740
7.1%
r 50555
 
3.9%
Other values (11) 124694
9.6%

migration_code_change_in_reg
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
50154 
Same area
45912 
Different area
 
2813

Length

Max length15
Median length15
Mean length12.185601
Min length9

Characters and Unicode

Total characters1204900
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowNot in universe
3rd rowNot in universe
4th rowSame area
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 50154
50.7%
Same area 45912
46.4%
Different area 2813
 
2.8%

Length

2025-01-19T18:38:55.437471image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:55.555528image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 50154
20.2%
in 50154
20.2%
universe 50154
20.2%
area 48725
19.7%
same 45912
18.5%
different 2813
 
1.1%

Most occurring characters

ValueCountFrequency (%)
e 200571
16.6%
149033
12.4%
a 143362
11.9%
i 103121
8.6%
n 103121
8.6%
r 101692
8.4%
t 52967
 
4.4%
N 50154
 
4.2%
o 50154
 
4.2%
u 50154
 
4.2%
Other values (6) 200571
16.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 956988
79.4%
Space Separator 149033
 
12.4%
Uppercase Letter 98879
 
8.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 200571
21.0%
a 143362
15.0%
i 103121
10.8%
n 103121
10.8%
r 101692
10.6%
t 52967
 
5.5%
o 50154
 
5.2%
u 50154
 
5.2%
v 50154
 
5.2%
s 50154
 
5.2%
Other values (2) 51538
 
5.4%
Uppercase Letter
ValueCountFrequency (%)
N 50154
50.7%
S 45912
46.4%
D 2813
 
2.8%
Space Separator
ValueCountFrequency (%)
149033
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1055867
87.6%
Common 149033
 
12.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 200571
19.0%
a 143362
13.6%
i 103121
9.8%
n 103121
9.8%
r 101692
9.6%
t 52967
 
5.0%
N 50154
 
4.8%
o 50154
 
4.8%
u 50154
 
4.8%
v 50154
 
4.8%
Other values (5) 150417
14.2%
Common
ValueCountFrequency (%)
149033
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1204900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 200571
16.6%
149033
12.4%
a 143362
11.9%
i 103121
8.6%
n 103121
8.6%
r 101692
8.4%
t 52967
 
4.4%
N 50154
 
4.2%
o 50154
 
4.2%
u 50154
 
4.2%
Other values (6) 200571
16.6%

migration_code_move_within_reg
Categorical

High correlation  Imbalance 

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
?
49470 
Nonmover
41044 
Same county
 
4868
Different county same state
 
1328
Not in universe
 
684
Other values (5)
 
1485

Length

Max length29
Median length2
Mean length6.1623499
Min length2

Characters and Unicode

Total characters609327
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row ?
2nd row ?
3rd row ?
4th row Nonmover
5th row ?

Common Values

ValueCountFrequency (%)
? 49470
50.0%
Nonmover 41044
41.5%
Same county 4868
 
4.9%
Different county same state 1328
 
1.3%
Not in universe 684
 
0.7%
Different state in South 475
 
0.5%
Different state in West 358
 
0.4%
Different state in Midwest 242
 
0.2%
Different state in Northeast 208
 
0.2%
Abroad 202
 
0.2%

Length

2025-01-19T18:38:55.689207image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:55.852455image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
49470
43.8%
nonmover 41044
36.3%
same 6196
 
5.5%
county 6196
 
5.5%
different 2611
 
2.3%
state 2611
 
2.3%
in 1967
 
1.7%
not 684
 
0.6%
universe 684
 
0.6%
south 475
 
0.4%
Other values (4) 1010
 
0.9%

Most occurring characters

ValueCountFrequency (%)
112948
18.5%
o 89853
14.7%
e 57249
9.4%
n 52502
8.6%
? 49470
8.1%
m 47240
7.8%
r 44749
 
7.3%
N 41936
 
6.9%
v 41728
 
6.8%
t 16204
 
2.7%
Other values (16) 55448
9.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 396217
65.0%
Space Separator 112948
 
18.5%
Uppercase Letter 50692
 
8.3%
Other Punctuation 49470
 
8.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 89853
22.7%
e 57249
14.4%
n 52502
13.3%
m 47240
11.9%
r 44749
11.3%
v 41728
10.5%
t 16204
 
4.1%
a 9217
 
2.3%
u 7355
 
1.9%
c 6196
 
1.6%
Other values (8) 23924
 
6.0%
Uppercase Letter
ValueCountFrequency (%)
N 41936
82.7%
S 5343
 
10.5%
D 2611
 
5.2%
W 358
 
0.7%
M 242
 
0.5%
A 202
 
0.4%
Space Separator
ValueCountFrequency (%)
112948
100.0%
Other Punctuation
ValueCountFrequency (%)
? 49470
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 446909
73.3%
Common 162418
 
26.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 89853
20.1%
e 57249
12.8%
n 52502
11.7%
m 47240
10.6%
r 44749
10.0%
N 41936
9.4%
v 41728
9.3%
t 16204
 
3.6%
a 9217
 
2.1%
u 7355
 
1.6%
Other values (14) 38876
8.7%
Common
ValueCountFrequency (%)
112948
69.5%
? 49470
30.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 609327
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
112948
18.5%
o 89853
14.7%
e 57249
9.4%
n 52502
8.6%
? 49470
8.1%
m 47240
7.8%
r 44749
 
7.3%
N 41936
 
6.9%
v 41728
 
6.8%
t 16204
 
2.7%
Other values (16) 55448
9.1%

live_in_this_house_1_year_ago
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
50154 
Yes
41044 
No
7681 

Length

Max length15
Median length15
Mean length9.5018052
Min length3

Characters and Unicode

Total characters939529
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowNot in universe
3rd rowNot in universe
4th row Yes
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 50154
50.7%
Yes 41044
41.5%
No 7681
 
7.8%

Length

2025-01-19T18:38:56.066817image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:56.178044image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 50154
25.2%
in 50154
25.2%
universe 50154
25.2%
yes 41044
20.6%
no 7681
 
3.9%

Most occurring characters

ValueCountFrequency (%)
149033
15.9%
e 141352
15.0%
i 100308
10.7%
n 100308
10.7%
s 91198
9.7%
N 57835
 
6.2%
o 57835
 
6.2%
t 50154
 
5.3%
u 50154
 
5.3%
v 50154
 
5.3%
Other values (2) 91198
9.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 691617
73.6%
Space Separator 149033
 
15.9%
Uppercase Letter 98879
 
10.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 141352
20.4%
i 100308
14.5%
n 100308
14.5%
s 91198
13.2%
o 57835
8.4%
t 50154
 
7.3%
u 50154
 
7.3%
v 50154
 
7.3%
r 50154
 
7.3%
Uppercase Letter
ValueCountFrequency (%)
N 57835
58.5%
Y 41044
41.5%
Space Separator
ValueCountFrequency (%)
149033
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 790496
84.1%
Common 149033
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 141352
17.9%
i 100308
12.7%
n 100308
12.7%
s 91198
11.5%
N 57835
7.3%
o 57835
7.3%
t 50154
 
6.3%
u 50154
 
6.3%
v 50154
 
6.3%
r 50154
 
6.3%
Common
ValueCountFrequency (%)
149033
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 939529
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
149033
15.9%
e 141352
15.0%
i 100308
10.7%
n 100308
10.7%
s 91198
9.7%
N 57835
 
6.2%
o 57835
 
6.2%
t 50154
 
5.3%
u 50154
 
5.3%
v 50154
 
5.3%
Other values (2) 91198
9.7%

migration_prev_res_in_sunbelt
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
91198 
No
 
4803
Yes
 
2878

Length

Max length15
Median length15
Mean length14.096937
Min length3

Characters and Unicode

Total characters1393891
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowNot in universe
3rd rowNot in universe
4th rowNot in universe
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 91198
92.2%
No 4803
 
4.9%
Yes 2878
 
2.9%

Length

2025-01-19T18:38:56.304044image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:56.412045image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 91198
32.4%
in 91198
32.4%
universe 91198
32.4%
no 4803
 
1.7%
yes 2878
 
1.0%

Most occurring characters

ValueCountFrequency (%)
190077
13.6%
e 185274
13.3%
i 182396
13.1%
n 182396
13.1%
N 96001
6.9%
o 96001
6.9%
s 94076
6.7%
t 91198
6.5%
u 91198
6.5%
v 91198
6.5%
Other values (2) 94076
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1104935
79.3%
Space Separator 190077
 
13.6%
Uppercase Letter 98879
 
7.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 185274
16.8%
i 182396
16.5%
n 182396
16.5%
o 96001
8.7%
s 94076
8.5%
t 91198
8.3%
u 91198
8.3%
v 91198
8.3%
r 91198
8.3%
Uppercase Letter
ValueCountFrequency (%)
N 96001
97.1%
Y 2878
 
2.9%
Space Separator
ValueCountFrequency (%)
190077
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1203814
86.4%
Common 190077
 
13.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 185274
15.4%
i 182396
15.2%
n 182396
15.2%
N 96001
8.0%
o 96001
8.0%
s 94076
7.8%
t 91198
7.6%
u 91198
7.6%
v 91198
7.6%
r 91198
7.6%
Common
ValueCountFrequency (%)
190077
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1393891
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
190077
13.6%
e 185274
13.3%
i 182396
13.1%
n 182396
13.1%
N 96001
6.9%
o 96001
6.9%
s 94076
6.7%
t 91198
6.5%
u 91198
6.5%
v 91198
6.5%
Other values (2) 94076
6.7%

num_persons_worked_for_employer
Real number (ℝ)

High correlation  Zeros 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9735839
Minimum0
Maximum6
Zeros47009
Zeros (%)47.5%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:56.510073image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.3676159
Coefficient of variation (CV)1.199653
Kurtosis-1.0968823
Mean1.9735839
Median Absolute Deviation (MAD)1
Skewness0.74035049
Sum195146
Variance5.6056049
MonotonicityNot monotonic
2025-01-19T18:38:56.629307image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 47009
47.5%
6 18328
 
18.5%
1 11641
 
11.8%
4 7059
 
7.1%
3 6836
 
6.9%
2 5079
 
5.1%
5 2927
 
3.0%
ValueCountFrequency (%)
0 47009
47.5%
1 11641
 
11.8%
2 5079
 
5.1%
3 6836
 
6.9%
4 7059
 
7.1%
5 2927
 
3.0%
6 18328
 
18.5%
ValueCountFrequency (%)
6 18328
 
18.5%
5 2927
 
3.0%
4 7059
 
7.1%
3 6836
 
6.9%
2 5079
 
5.1%
1 11641
 
11.8%
0 47009
47.5%

family_members_under_18
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
72372 
Both parents present
18580 
Mother only present
 
6181
Father only present
 
941
Neither parent present
 
805

Length

Max length23
Median length16
Mean length17.284631
Min length16

Characters and Unicode

Total characters1709087
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Both parents present
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 72372
73.2%
Both parents present 18580
 
18.8%
Mother only present 6181
 
6.3%
Father only present 941
 
1.0%
Neither parent present 805
 
0.8%

Length

2025-01-19T18:38:56.826101image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:56.967130image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 72372
24.4%
in 72372
24.4%
universe 72372
24.4%
present 26507
 
8.9%
both 18580
 
6.3%
parents 18580
 
6.3%
only 7122
 
2.4%
mother 6181
 
2.1%
father 941
 
0.3%
neither 805
 
0.3%

Most occurring characters

ValueCountFrequency (%)
296637
17.4%
e 225875
13.2%
n 197758
11.6%
i 145549
8.5%
t 144771
8.5%
r 126191
7.4%
s 117459
 
6.9%
o 104255
 
6.1%
N 73177
 
4.3%
u 72372
 
4.2%
Other values (9) 205043
12.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1313571
76.9%
Space Separator 296637
 
17.4%
Uppercase Letter 98879
 
5.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 225875
17.2%
n 197758
15.1%
i 145549
11.1%
t 144771
11.0%
r 126191
9.6%
s 117459
8.9%
o 104255
7.9%
u 72372
 
5.5%
v 72372
 
5.5%
p 45892
 
3.5%
Other values (4) 61077
 
4.6%
Uppercase Letter
ValueCountFrequency (%)
N 73177
74.0%
B 18580
 
18.8%
M 6181
 
6.3%
F 941
 
1.0%
Space Separator
ValueCountFrequency (%)
296637
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1412450
82.6%
Common 296637
 
17.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 225875
16.0%
n 197758
14.0%
i 145549
10.3%
t 144771
10.2%
r 126191
8.9%
s 117459
8.3%
o 104255
7.4%
N 73177
 
5.2%
u 72372
 
5.1%
v 72372
 
5.1%
Other values (8) 132671
9.4%
Common
ValueCountFrequency (%)
296637
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1709087
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
296637
17.4%
e 225875
13.2%
n 197758
11.6%
i 145549
8.5%
t 144771
8.5%
r 126191
7.4%
s 117459
 
6.9%
o 104255
 
6.1%
N 73177
 
4.3%
u 72372
 
4.2%
Other values (9) 205043
12.0%

country_of_birth_father
Categorical

High correlation  Imbalance 

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
United-States
78519 
Mexico
 
5030
?
 
3425
Puerto-Rico
 
1285
Italy
 
1119
Other values (38)
9501 

Length

Max length29
Median length14
Mean length12.644738
Min length2

Characters and Unicode

Total characters1250299
Distinct characters47
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Mexico
2nd row United-States
3rd row United-States
4th row United-States
5th row United-States

Common Values

ValueCountFrequency (%)
United-States 78519
79.4%
Mexico 5030
 
5.1%
? 3425
 
3.5%
Puerto-Rico 1285
 
1.3%
Italy 1119
 
1.1%
Dominican-Republic 681
 
0.7%
Canada 659
 
0.7%
Germany 648
 
0.7%
Poland 629
 
0.6%
Philippines 591
 
0.6%
Other values (33) 6293
 
6.4%

Length

2025-01-19T18:38:57.134130image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states 78519
78.9%
mexico 5030
 
5.1%
3425
 
3.4%
puerto-rico 1285
 
1.3%
italy 1119
 
1.1%
dominican-republic 681
 
0.7%
canada 659
 
0.7%
germany 648
 
0.7%
poland 629
 
0.6%
philippines 591
 
0.6%
Other values (39) 6904
 
6.9%

Most occurring characters

ValueCountFrequency (%)
t 239312
19.1%
e 167264
13.4%
99490
8.0%
a 92045
 
7.4%
i 91125
 
7.3%
n 85629
 
6.8%
d 82073
 
6.6%
- 81083
 
6.5%
S 79547
 
6.4%
s 79441
 
6.4%
Other values (37) 153290
12.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 888711
71.1%
Uppercase Letter 177369
 
14.2%
Space Separator 99490
 
8.0%
Dash Punctuation 81083
 
6.5%
Other Punctuation 3494
 
0.3%
Open Punctuation 76
 
< 0.1%
Close Punctuation 76
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 239312
26.9%
e 167264
18.8%
a 92045
 
10.4%
i 91125
 
10.3%
n 85629
 
9.6%
d 82073
 
9.2%
s 79441
 
8.9%
o 11411
 
1.3%
c 8784
 
1.0%
l 5843
 
0.7%
Other values (11) 25784
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
S 79547
44.8%
U 78671
44.4%
M 5030
 
2.8%
P 2876
 
1.6%
C 2048
 
1.2%
R 1966
 
1.1%
I 1912
 
1.1%
G 1161
 
0.7%
E 1076
 
0.6%
D 681
 
0.4%
Other values (10) 2401
 
1.4%
Other Punctuation
ValueCountFrequency (%)
? 3425
98.0%
& 69
 
2.0%
Space Separator
ValueCountFrequency (%)
99490
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 81083
100.0%
Open Punctuation
ValueCountFrequency (%)
( 76
100.0%
Close Punctuation
ValueCountFrequency (%)
) 76
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1066080
85.3%
Common 184219
 
14.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 239312
22.4%
e 167264
15.7%
a 92045
 
8.6%
i 91125
 
8.5%
n 85629
 
8.0%
d 82073
 
7.7%
S 79547
 
7.5%
s 79441
 
7.5%
U 78671
 
7.4%
o 11411
 
1.1%
Other values (31) 59562
 
5.6%
Common
ValueCountFrequency (%)
99490
54.0%
- 81083
44.0%
? 3425
 
1.9%
( 76
 
< 0.1%
) 76
 
< 0.1%
& 69
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1250299
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 239312
19.1%
e 167264
13.4%
99490
8.0%
a 92045
 
7.4%
i 91125
 
7.3%
n 85629
 
6.8%
d 82073
 
6.6%
- 81083
 
6.5%
S 79547
 
6.4%
s 79441
 
6.4%
Other values (37) 153290
12.3%

country_of_birth_mother
Categorical

High correlation  Imbalance 

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
United-States
79165 
Mexico
 
4969
?
 
3070
Puerto-Rico
 
1190
Italy
 
917
Other values (38)
9568 

Length

Max length29
Median length14
Mean length12.69044
Min length2

Characters and Unicode

Total characters1254818
Distinct characters47
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Mexico
2nd row United-States
3rd row United-States
4th row United-States
5th row United-States

Common Values

ValueCountFrequency (%)
United-States 79165
80.1%
Mexico 4969
 
5.0%
? 3070
 
3.1%
Puerto-Rico 1190
 
1.2%
Italy 917
 
0.9%
Canada 702
 
0.7%
Germany 676
 
0.7%
Philippines 644
 
0.7%
Cuba 594
 
0.6%
Poland 585
 
0.6%
Other values (33) 6367
 
6.4%

Length

2025-01-19T18:38:57.438143image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states 79165
79.6%
mexico 4969
 
5.0%
3070
 
3.1%
puerto-rico 1190
 
1.2%
italy 917
 
0.9%
canada 702
 
0.7%
germany 676
 
0.7%
philippines 644
 
0.6%
cuba 594
 
0.6%
poland 585
 
0.6%
Other values (39) 6961
 
7.0%

Most occurring characters

ValueCountFrequency (%)
t 240965
19.2%
e 168270
13.4%
99473
7.9%
a 92713
 
7.4%
i 91332
 
7.3%
n 86249
 
6.9%
d 82819
 
6.6%
- 81508
 
6.5%
S 80234
 
6.4%
s 80116
 
6.4%
Other values (37) 151139
12.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 892492
71.1%
Uppercase Letter 178093
 
14.2%
Space Separator 99473
 
7.9%
Dash Punctuation 81508
 
6.5%
Other Punctuation 3128
 
0.2%
Open Punctuation 62
 
< 0.1%
Close Punctuation 62
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 240965
27.0%
e 168270
18.9%
a 92713
 
10.4%
i 91332
 
10.2%
n 86249
 
9.7%
d 82819
 
9.3%
s 80116
 
9.0%
o 11006
 
1.2%
c 8292
 
0.9%
l 5603
 
0.6%
Other values (11) 25127
 
2.8%
Uppercase Letter
ValueCountFrequency (%)
S 80234
45.1%
U 79289
44.5%
M 4969
 
2.8%
P 2784
 
1.6%
C 2054
 
1.2%
R 1728
 
1.0%
I 1721
 
1.0%
E 1161
 
0.7%
G 1122
 
0.6%
D 538
 
0.3%
Other values (10) 2493
 
1.4%
Other Punctuation
ValueCountFrequency (%)
? 3070
98.1%
& 58
 
1.9%
Space Separator
ValueCountFrequency (%)
99473
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 81508
100.0%
Open Punctuation
ValueCountFrequency (%)
( 62
100.0%
Close Punctuation
ValueCountFrequency (%)
) 62
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1070585
85.3%
Common 184233
 
14.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 240965
22.5%
e 168270
15.7%
a 92713
 
8.7%
i 91332
 
8.5%
n 86249
 
8.1%
d 82819
 
7.7%
S 80234
 
7.5%
s 80116
 
7.5%
U 79289
 
7.4%
o 11006
 
1.0%
Other values (31) 57592
 
5.4%
Common
ValueCountFrequency (%)
99473
54.0%
- 81508
44.2%
? 3070
 
1.7%
( 62
 
< 0.1%
) 62
 
< 0.1%
& 58
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1254818
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 240965
19.2%
e 168270
13.4%
99473
7.9%
a 92713
 
7.4%
i 91332
 
7.3%
n 86249
 
6.9%
d 82819
 
6.6%
- 81508
 
6.5%
S 80234
 
6.4%
s 80116
 
6.4%
Other values (37) 151139
12.0%

country_of_birth_self
Categorical

High correlation  Imbalance 

Distinct43
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
United-States
87478 
Mexico
 
2934
?
 
1763
Puerto-Rico
 
691
Philippines
 
454
Other values (38)
 
5559

Length

Max length29
Median length14
Mean length13.259337
Min length2

Characters and Unicode

Total characters1311070
Distinct characters47
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Mexico
2nd row United-States
3rd row United-States
4th row United-States
5th row United-States

Common Values

ValueCountFrequency (%)
United-States 87478
88.5%
Mexico 2934
 
3.0%
? 1763
 
1.8%
Puerto-Rico 691
 
0.7%
Philippines 454
 
0.5%
Cuba 428
 
0.4%
Germany 420
 
0.4%
Canada 346
 
0.3%
El-Salvador 342
 
0.3%
Dominican-Republic 327
 
0.3%
Other values (33) 3696
 
3.7%

Length

2025-01-19T18:38:57.630154image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states 87478
88.0%
mexico 2934
 
3.0%
1763
 
1.8%
puerto-rico 691
 
0.7%
philippines 454
 
0.5%
cuba 428
 
0.4%
germany 420
 
0.4%
canada 346
 
0.3%
el-salvador 342
 
0.3%
dominican-republic 327
 
0.3%
Other values (39) 4168
 
4.2%

Most occurring characters

ValueCountFrequency (%)
t 264267
20.2%
e 180908
13.8%
99351
 
7.6%
a 95334
 
7.3%
i 95149
 
7.3%
n 91503
 
7.0%
d 89343
 
6.8%
- 88895
 
6.8%
S 88183
 
6.7%
s 88122
 
6.7%
Other values (37) 130015
9.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 934298
71.3%
Uppercase Letter 186623
 
14.2%
Space Separator 99351
 
7.6%
Dash Punctuation 88895
 
6.8%
Other Punctuation 1811
 
0.1%
Open Punctuation 46
 
< 0.1%
Close Punctuation 46
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 264267
28.3%
e 180908
19.4%
a 95334
 
10.2%
i 95149
 
10.2%
n 91503
 
9.8%
d 89343
 
9.6%
s 88122
 
9.4%
o 6555
 
0.7%
c 4933
 
0.5%
x 2934
 
0.3%
Other values (11) 15250
 
1.6%
Uppercase Letter
ValueCountFrequency (%)
S 88183
47.3%
U 87570
46.9%
M 2934
 
1.6%
P 1542
 
0.8%
C 1285
 
0.7%
R 1018
 
0.5%
G 707
 
0.4%
E 701
 
0.4%
I 627
 
0.3%
J 354
 
0.2%
Other values (10) 1702
 
0.9%
Other Punctuation
ValueCountFrequency (%)
? 1763
97.3%
& 48
 
2.7%
Space Separator
ValueCountFrequency (%)
99351
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 88895
100.0%
Open Punctuation
ValueCountFrequency (%)
( 46
100.0%
Close Punctuation
ValueCountFrequency (%)
) 46
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1120921
85.5%
Common 190149
 
14.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 264267
23.6%
e 180908
16.1%
a 95334
 
8.5%
i 95149
 
8.5%
n 91503
 
8.2%
d 89343
 
8.0%
S 88183
 
7.9%
s 88122
 
7.9%
U 87570
 
7.8%
o 6555
 
0.6%
Other values (31) 33987
 
3.0%
Common
ValueCountFrequency (%)
99351
52.2%
- 88895
46.8%
? 1763
 
0.9%
& 48
 
< 0.1%
( 46
 
< 0.1%
) 46
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1311070
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 264267
20.2%
e 180908
13.8%
99351
 
7.6%
a 95334
 
7.3%
i 95149
 
7.3%
n 91503
 
7.0%
d 89343
 
6.8%
- 88895
 
6.8%
S 88183
 
6.7%
s 88122
 
6.7%
Other values (37) 130015
9.9%

citizenship
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Native
89168 
Foreign
 
6699
Naturalized
 
3012

Length

Max length11
Median length6
Mean length6.2200568
Min length6

Characters and Unicode

Total characters615033
Distinct characters15
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowForeign
2nd rowNative
3rd rowNative
4th rowNative
5th rowNative

Common Values

ValueCountFrequency (%)
Native 89168
90.2%
Foreign 6699
 
6.8%
Naturalized 3012
 
3.0%

Length

2025-01-19T18:38:57.805104image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:57.916104image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
native 89168
90.2%
foreign 6699
 
6.8%
naturalized 3012
 
3.0%

Most occurring characters

ValueCountFrequency (%)
i 98879
16.1%
e 98879
16.1%
a 95192
15.5%
N 92180
15.0%
t 92180
15.0%
v 89168
14.5%
r 9711
 
1.6%
F 6699
 
1.1%
o 6699
 
1.1%
g 6699
 
1.1%
Other values (5) 18747
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 516154
83.9%
Uppercase Letter 98879
 
16.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 98879
19.2%
e 98879
19.2%
a 95192
18.4%
t 92180
17.9%
v 89168
17.3%
r 9711
 
1.9%
o 6699
 
1.3%
g 6699
 
1.3%
n 6699
 
1.3%
u 3012
 
0.6%
Other values (3) 9036
 
1.8%
Uppercase Letter
ValueCountFrequency (%)
N 92180
93.2%
F 6699
 
6.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 615033
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 98879
16.1%
e 98879
16.1%
a 95192
15.5%
N 92180
15.0%
t 92180
15.0%
v 89168
14.5%
r 9711
 
1.6%
F 6699
 
1.1%
o 6699
 
1.1%
g 6699
 
1.1%
Other values (5) 18747
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 615033
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 98879
16.1%
e 98879
16.1%
a 95192
15.5%
N 92180
15.0%
t 92180
15.0%
v 89168
14.5%
r 9711
 
1.6%
F 6699
 
1.1%
o 6699
 
1.1%
g 6699
 
1.1%
Other values (5) 18747
 
3.0%

own_business_or_self_employed
Categorical

Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
89305 
No
 
8234
Yes
 
1340

Length

Max length15
Median length15
Mean length13.754822
Min length2

Characters and Unicode

Total characters1360063
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot in universe
2nd rowNot in universe
3rd rowNot in universe
4th rowNo
5th rowNot in universe

Common Values

ValueCountFrequency (%)
Not in universe 89305
90.3%
No 8234
 
8.3%
Yes 1340
 
1.4%

Length

2025-01-19T18:38:58.046103image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:58.183854image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 89305
32.2%
in 89305
32.2%
universe 89305
32.2%
no 8234
 
3.0%
yes 1340
 
0.5%

Most occurring characters

ValueCountFrequency (%)
e 179950
13.2%
178610
13.1%
i 178610
13.1%
n 178610
13.1%
N 97539
7.2%
o 97539
7.2%
s 90645
6.7%
t 89305
6.6%
u 89305
6.6%
v 89305
6.6%
Other values (2) 90645
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1082574
79.6%
Space Separator 178610
 
13.1%
Uppercase Letter 98879
 
7.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 179950
16.6%
i 178610
16.5%
n 178610
16.5%
o 97539
9.0%
s 90645
8.4%
t 89305
8.2%
u 89305
8.2%
v 89305
8.2%
r 89305
8.2%
Uppercase Letter
ValueCountFrequency (%)
N 97539
98.6%
Y 1340
 
1.4%
Space Separator
ValueCountFrequency (%)
178610
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1181453
86.9%
Common 178610
 
13.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 179950
15.2%
i 178610
15.1%
n 178610
15.1%
N 97539
8.3%
o 97539
8.3%
s 90645
7.7%
t 89305
7.6%
u 89305
7.6%
v 89305
7.6%
r 89305
7.6%
Common
ValueCountFrequency (%)
178610
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1360063
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 179950
13.2%
178610
13.1%
i 178610
13.1%
n 178610
13.1%
N 97539
7.2%
o 97539
7.2%
s 90645
6.7%
t 89305
6.6%
u 89305
6.6%
v 89305
6.6%
Other values (2) 90645
6.7%

fill_inc_questionnaire_for_veteran's_admin
Categorical

High correlation  Imbalance 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not in universe
97852 
No
 
828
Yes
 
199

Length

Max length16
Median length16
Mean length15.866989
Min length3

Characters and Unicode

Total characters1568912
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row Not in universe
2nd row Not in universe
3rd row Not in universe
4th row Not in universe
5th row Not in universe

Common Values

ValueCountFrequency (%)
Not in universe 97852
99.0%
No 828
 
0.8%
Yes 199
 
0.2%

Length

2025-01-19T18:38:58.333866image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:58.460258image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 97852
33.2%
in 97852
33.2%
universe 97852
33.2%
no 828
 
0.3%
yes 199
 
0.1%

Most occurring characters

ValueCountFrequency (%)
294583
18.8%
e 195903
12.5%
i 195704
12.5%
n 195704
12.5%
N 98680
 
6.3%
o 98680
 
6.3%
s 98051
 
6.2%
t 97852
 
6.2%
u 97852
 
6.2%
v 97852
 
6.2%
Other values (2) 98051
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1175450
74.9%
Space Separator 294583
 
18.8%
Uppercase Letter 98879
 
6.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 195903
16.7%
i 195704
16.6%
n 195704
16.6%
o 98680
8.4%
s 98051
8.3%
t 97852
8.3%
u 97852
8.3%
v 97852
8.3%
r 97852
8.3%
Uppercase Letter
ValueCountFrequency (%)
N 98680
99.8%
Y 199
 
0.2%
Space Separator
ValueCountFrequency (%)
294583
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1274329
81.2%
Common 294583
 
18.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 195903
15.4%
i 195704
15.4%
n 195704
15.4%
N 98680
7.7%
o 98680
7.7%
s 98051
7.7%
t 97852
7.7%
u 97852
7.7%
v 97852
7.7%
r 97852
7.7%
Common
ValueCountFrequency (%)
294583
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1568912
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
294583
18.8%
e 195903
12.5%
i 195704
12.5%
n 195704
12.5%
N 98680
 
6.3%
o 98680
 
6.3%
s 98051
 
6.2%
t 97852
 
6.2%
u 97852
 
6.2%
v 97852
 
6.2%
Other values (2) 98051
 
6.2%

veterans_benefits
Categorical

High correlation 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
Not a Veteran
75256 
Not in universe
22596 
Veteran
 
1027

Length

Max length15
Median length13
Mean length13.394725
Min length7

Characters and Unicode

Total characters1324457
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot a Veteran
2nd rowNot a Veteran
3rd rowNot in universe
4th rowNot a Veteran
5th rowNot a Veteran

Common Values

ValueCountFrequency (%)
Not a Veteran 75256
76.1%
Not in universe 22596
 
22.9%
Veteran 1027
 
1.0%

Length

2025-01-19T18:38:58.624254image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:58.769557image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
not 97852
33.2%
veteran 76283
25.9%
a 75256
25.5%
in 22596
 
7.7%
universe 22596
 
7.7%

Most occurring characters

ValueCountFrequency (%)
e 197758
14.9%
195704
14.8%
t 174135
13.1%
a 151539
11.4%
n 121475
9.2%
r 98879
7.5%
N 97852
7.4%
o 97852
7.4%
V 76283
 
5.8%
i 45192
 
3.4%
Other values (3) 67788
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 954618
72.1%
Space Separator 195704
 
14.8%
Uppercase Letter 174135
 
13.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 197758
20.7%
t 174135
18.2%
a 151539
15.9%
n 121475
12.7%
r 98879
10.4%
o 97852
10.3%
i 45192
 
4.7%
u 22596
 
2.4%
v 22596
 
2.4%
s 22596
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
N 97852
56.2%
V 76283
43.8%
Space Separator
ValueCountFrequency (%)
195704
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1128753
85.2%
Common 195704
 
14.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 197758
17.5%
t 174135
15.4%
a 151539
13.4%
n 121475
10.8%
r 98879
8.8%
N 97852
8.7%
o 97852
8.7%
V 76283
 
6.8%
i 45192
 
4.0%
u 22596
 
2.0%
Other values (2) 45192
 
4.0%
Common
ValueCountFrequency (%)
195704
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1324457
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 197758
14.9%
195704
14.8%
t 174135
13.1%
a 151539
11.4%
n 121475
9.2%
r 98879
7.5%
N 97852
7.4%
o 97852
7.4%
V 76283
 
5.8%
i 45192
 
3.4%
Other values (3) 67788
 
5.1%

weeks_worked_in_year
Real number (ℝ)

High correlation  Zeros 

Distinct53
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.391226
Minimum0
Maximum52
Zeros47009
Zeros (%)47.5%
Negative0
Negative (%)0.0%
Memory size1.5 MiB
2025-01-19T18:38:58.928556image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median10
Q352
95-th percentile52
Maximum52
Range52
Interquartile range (IQR)52

Descriptive statistics

Standard deviation24.398786
Coefficient of variation (CV)1.0430743
Kurtosis-1.8681688
Mean23.391226
Median Absolute Deviation (MAD)10
Skewness0.19363319
Sum2312901
Variance595.30075
MonotonicityNot monotonic
2025-01-19T18:38:59.097557image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 47009
47.5%
52 35051
35.4%
40 1362
 
1.4%
26 1125
 
1.1%
50 1109
 
1.1%
48 1010
 
1.0%
12 952
 
1.0%
20 725
 
0.7%
30 725
 
0.7%
8 564
 
0.6%
Other values (43) 9247
 
9.4%
ValueCountFrequency (%)
0 47009
47.5%
1 226
 
0.2%
2 209
 
0.2%
3 207
 
0.2%
4 361
 
0.4%
5 127
 
0.1%
6 315
 
0.3%
7 71
 
0.1%
8 564
 
0.6%
9 126
 
0.1%
ValueCountFrequency (%)
52 35051
35.4%
51 413
 
0.4%
50 1109
 
1.1%
49 308
 
0.3%
48 1010
 
1.0%
47 124
 
0.1%
46 302
 
0.3%
45 349
 
0.4%
44 459
 
0.5%
43 188
 
0.2%

year
Categorical

High correlation 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
1995
49470 
1994
49409 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters395516
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1995
2nd row1995
3rd row1995
4th row1994
5th row1995

Common Values

ValueCountFrequency (%)
1995 49470
50.0%
1994 49409
50.0%

Length

2025-01-19T18:38:59.258472image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:59.375471image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
1995 49470
50.0%
1994 49409
50.0%

Most occurring characters

ValueCountFrequency (%)
9 197758
50.0%
1 98879
25.0%
5 49470
 
12.5%
4 49409
 
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 395516
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9 197758
50.0%
1 98879
25.0%
5 49470
 
12.5%
4 49409
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Common 395516
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9 197758
50.0%
1 98879
25.0%
5 49470
 
12.5%
4 49409
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 395516
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9 197758
50.0%
1 98879
25.0%
5 49470
 
12.5%
4 49409
 
12.5%

target
Categorical

Imbalance 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.5 MiB
1
92693 
0
 
6186

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters98879
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 92693
93.7%
0 6186
 
6.3%

Length

2025-01-19T18:38:59.498488image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-19T18:38:59.605471image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
1 92693
93.7%
0 6186
 
6.3%

Most occurring characters

ValueCountFrequency (%)
1 92693
93.7%
0 6186
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 98879
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 92693
93.7%
0 6186
 
6.3%

Most occurring scripts

ValueCountFrequency (%)
Common 98879
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 92693
93.7%
0 6186
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 98879
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 92693
93.7%
0 6186
 
6.3%

Interactions

2025-01-19T18:38:44.390372image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:36.590795image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:37.673806image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:38.776355image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:39.867688image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:40.914557image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:42.173865image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:43.310398image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:44.509341image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:36.707752image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:37.793751image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:38.901350image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:39.991984image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:41.041593image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:42.303939image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:43.433343image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:44.639343image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:36.898752image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:37.918794image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:39.028341image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:40.111945image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:41.175606image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:42.436893image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:43.568347image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:44.771936image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:37.022752image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:38.069853image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:39.158319image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:40.236944image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:41.307518image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:42.577573image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:43.696354image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:44.894085image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:37.142795image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:38.201098image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:39.284689image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:40.356002image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:41.433226image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:42.710624image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:43.826384image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:45.038583image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:37.269752image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:38.330059image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:39.437678image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:40.482945image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:41.563864image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:42.864176image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:43.966343image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:45.196471image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:37.418782image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:38.500809image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:39.594634image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:40.629943image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:41.874902image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:43.028847image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:44.118371image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:45.333472image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:37.547753image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:38.647301image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:39.736665image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:40.787551image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:42.025915image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:43.172858image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-01-19T18:38:44.256397image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-01-19T18:38:59.731472image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
agecapital_gainscapital_lossescitizenshipclass_of_workercountry_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfdetailed_household_and_family_statdetailed_household_summary_in_householddetailed_industry_recodedetailed_occupation_recodedividends_from_stockseducationenroll_in_edu_inst_last_wkfamily_members_under_18fill_inc_questionnaire_for_veteran's_adminfull_or_part_time_employment_stathispanic_origininstance_weightlive_in_this_house_1_year_agomajor_industry_codemajor_occupation_codemarital_statmember_of_a_labor_unionmigration_code_change_in_msamigration_code_change_in_regmigration_code_move_within_regmigration_prev_res_in_sunbeltnum_persons_worked_for_employerown_business_or_self_employedracereason_for_unemploymentregion_of_previous_residencesextargettax_filer_statveterans_benefitswage_per_hourweeks_worked_in_yearyear
age1.0000.1260.0650.1250.3660.0920.0890.0650.5080.3980.2450.2510.2460.4440.4300.4810.0860.3170.0510.0090.1150.2450.2450.4280.1710.0730.0670.0850.1080.2250.1900.0540.0760.0690.0630.2460.5880.6430.0370.2670.009
capital_gains0.1261.000-0.0280.0150.0480.0180.0200.0160.0400.0500.0510.0930.1120.0720.0220.0290.0000.0290.0080.0100.0000.0500.0620.0290.0150.0000.0000.0000.0040.1140.0230.0110.0000.0020.0560.3100.0580.0370.0080.1260.000
capital_losses0.065-0.0281.0000.0150.0490.0000.0000.0000.0450.0540.0400.0480.0650.0540.0290.0380.0060.0320.0070.0070.0070.0380.0400.0470.0260.0040.0030.0110.0080.0940.0240.0100.0000.0070.0760.1750.0950.0530.0110.1040.000
citizenship0.1250.0150.0151.0000.0570.5380.5420.7190.1100.1130.0990.1210.0000.1390.0190.0900.0170.0430.3950.0550.0320.0860.0990.1040.0120.0780.0150.0750.0330.0420.0210.2440.0290.0750.0040.0440.0560.0880.0170.0330.006
class_of_worker0.3660.0480.0490.0571.0000.0540.0530.0550.2600.2770.6370.5970.0130.3190.0820.2760.0300.3630.0490.0230.0370.6500.5480.2120.2770.0270.0220.0500.0350.5100.2100.0470.4380.0270.1230.2340.4870.3890.0830.4480.006
country_of_birth_father0.0920.0180.0000.5380.0541.0000.7810.6680.0950.0670.0290.0360.0000.1140.0370.0710.0170.0580.5400.0570.0360.0340.0490.0890.0410.0460.0290.0330.0490.0430.0470.4420.0210.0540.0240.0720.0770.0790.0000.0300.032
country_of_birth_mother0.0890.0200.0000.5420.0530.7811.0000.6830.0900.0640.0290.0360.0000.1110.0380.0670.0140.0550.5460.0560.0350.0340.0480.0860.0400.0480.0270.0340.0520.0410.0450.4460.0230.0560.0220.0700.0730.0740.0000.0280.029
country_of_birth_self0.0650.0160.0000.7190.0550.6680.6831.0000.0910.0660.0360.0440.0000.1250.0340.0640.0000.0430.4850.0460.0300.0400.0570.0740.0280.0490.0230.0370.0430.0370.0250.3810.0310.0560.0270.0580.0610.0880.0000.0220.020
detailed_household_and_family_stat0.5080.0400.0450.1100.2600.0950.0900.0911.0000.9820.2650.2750.0270.4280.2050.5270.0490.1810.0800.0370.0680.2660.2680.4890.1070.0570.0380.0840.0650.2580.0950.0750.0400.0550.0540.1960.5360.5050.0520.2740.000
detailed_household_summary_in_household0.3980.0500.0540.1130.2770.0670.0640.0660.9821.0000.2200.2350.0180.3980.3390.5290.0700.2170.0540.0350.0680.2210.2240.4190.1290.0460.0370.0620.0630.2290.1380.0720.0620.0430.3730.2240.6630.6060.0360.2230.000
detailed_industry_recode0.2450.0510.0400.0990.6370.0290.0290.0360.2650.2201.0000.4260.0040.3200.1270.2760.0390.3610.0510.0260.0340.9160.5990.1940.2600.0290.0170.0330.0360.4050.2120.0560.1480.0290.3050.2830.4890.3880.0690.3060.006
detailed_occupation_recode0.2510.0930.0480.1210.5970.0360.0360.0440.2750.2350.4261.0000.0140.4030.1590.2770.0430.3610.0640.0220.0390.5661.0000.2030.2610.0310.0250.0350.0390.3950.2200.0690.1570.0280.3920.4370.4980.3880.0790.3100.008
dividends_from_stocks0.2460.1120.0650.0000.0130.0000.0000.0000.0270.0180.0040.0141.0000.0390.0030.0180.0070.0180.0000.0100.0070.0110.0100.0240.0000.0000.0010.0000.0030.1510.0100.0070.0000.0000.0090.1450.0360.025-0.0010.1550.004
education0.4440.0720.0540.1390.3190.1140.1110.1250.4280.3980.3200.4030.0391.0000.3260.4540.0450.2800.1010.0250.0220.3190.3720.2990.1480.0180.0240.0710.0160.2850.1590.0660.0640.0120.0640.3770.5450.7070.0510.2870.010
enroll_in_edu_inst_last_wk0.4300.0220.0290.0190.0820.0370.0380.0340.2050.3390.1270.1590.0030.3261.0000.1530.0140.0740.0210.0110.0180.1330.1160.1990.0270.0190.0100.0250.0200.0750.0660.0230.0780.0220.0150.0660.1770.1030.0240.1850.008
family_members_under_180.4810.0290.0380.0900.2760.0710.0670.0640.5270.5290.2760.2770.0180.4540.1531.0000.0430.2220.0660.0210.0290.2760.2770.3500.1280.0230.0160.0730.0250.2840.1260.1040.0450.0210.0340.1560.5320.6290.0430.2840.000
fill_inc_questionnaire_for_veteran's_admin0.0860.0000.0060.0170.0300.0170.0140.0000.0490.0700.0390.0430.0070.0450.0140.0431.0000.0350.0150.0130.0000.0400.0310.0680.0060.0050.0000.0110.0000.0220.0050.0110.0000.0020.0660.0290.0240.7070.0080.0180.000
full_or_part_time_employment_stat0.3170.0290.0320.0430.3630.0580.0550.0430.1810.2170.3610.3610.0180.2800.0740.2220.0351.0000.0320.0190.5510.3610.3600.1910.1530.4480.5510.4560.1620.3100.1350.0220.0740.1320.1020.1530.2790.3060.0550.3260.790
hispanic_origin0.0510.0080.0070.3950.0490.5400.5460.4850.0800.0540.0510.0640.0000.1010.0210.0660.0150.0321.0000.0610.0390.0450.0540.0580.0450.0380.0310.0280.0570.0370.0350.1570.0190.0470.0110.0670.0770.0700.0100.0260.040
instance_weight0.0090.0100.0070.0550.0230.0570.0560.0460.0370.0350.0260.0220.0100.0250.0110.0210.0130.0190.0611.0000.0310.0200.0210.0210.0150.0320.0260.0170.0350.0420.0240.0870.0110.0330.0330.0110.0460.0270.0230.0280.026
live_in_this_house_1_year_ago0.1150.0000.0070.0320.0370.0360.0350.0300.0680.0680.0340.0390.0070.0220.0180.0290.0000.5510.0390.0311.0000.0320.0320.0620.0070.9920.8151.0000.7070.0360.0480.0410.0280.7070.0050.0250.0510.0150.0000.0360.986
major_industry_code0.2450.0500.0380.0860.6500.0340.0340.0400.2660.2210.9160.5660.0110.3190.1330.2760.0400.3610.0450.0200.0321.0000.5910.1940.2600.0280.0190.0320.0320.4020.2120.0540.1490.0250.2970.2800.4890.3880.0680.3060.009
major_occupation_code0.2450.0620.0400.0990.5480.0490.0480.0570.2680.2240.5991.0000.0100.3720.1160.2770.0310.3600.0540.0210.0320.5911.0000.1970.2450.0270.0190.0330.0330.3780.2130.0550.1440.0250.3320.3660.4930.3870.0690.3040.009
marital_stat0.4280.0290.0470.1040.2120.0890.0860.0740.4890.4190.1940.2030.0240.2990.1990.3500.0680.1910.0580.0210.0620.1940.1971.0000.0930.0420.0300.0600.0560.1880.0770.0790.0370.0380.1660.1960.7180.4480.0390.1990.007
member_of_a_labor_union0.1710.0150.0260.0120.2770.0410.0400.0280.1070.1290.2600.2610.0000.1480.0270.1280.0060.1530.0450.0150.0070.2600.2450.0931.0000.0080.0060.0210.0050.2260.0730.0250.0410.0070.0300.0720.1650.1260.3500.2210.008
migration_code_change_in_msa0.0730.0000.0040.0780.0270.0460.0480.0490.0570.0460.0290.0310.0000.0180.0190.0230.0050.4480.0380.0320.9920.0280.0270.0420.0081.0000.8520.7940.7050.0260.0500.0480.0230.6300.0070.0270.0530.0160.0000.0270.982
migration_code_change_in_reg0.0670.0000.0030.0150.0220.0290.0270.0230.0380.0370.0170.0250.0010.0240.0100.0160.0000.5510.0310.0260.8150.0190.0190.0300.0060.8521.0001.0000.4400.0300.0420.0380.0250.4580.0060.0160.0270.0160.0000.0350.986
migration_code_move_within_reg0.0850.0000.0110.0750.0500.0330.0340.0370.0840.0620.0330.0350.0000.0710.0250.0730.0110.4560.0280.0171.0000.0320.0330.0600.0210.7941.0001.0000.7380.0420.0580.0410.0240.7080.0070.0370.0940.1090.0000.0361.000
migration_prev_res_in_sunbelt0.1080.0040.0080.0330.0350.0490.0520.0430.0650.0630.0360.0390.0030.0160.0200.0250.0000.1620.0570.0350.7070.0320.0330.0560.0050.7050.4400.7381.0000.0290.0460.0250.0270.8670.0010.0240.0480.0020.0000.0360.290
num_persons_worked_for_employer0.2250.1140.0940.0420.5100.0430.0410.0370.2580.2290.4050.3950.1510.2850.0750.2840.0220.3100.0370.0420.0360.4020.3780.1880.2260.0260.0300.0420.0291.0000.2230.0470.0600.0210.1090.2370.5220.4060.2240.8780.034
own_business_or_self_employed0.1900.0230.0240.0210.2100.0470.0450.0250.0950.1380.2120.2200.0100.1590.0660.1260.0050.1350.0350.0240.0480.2120.2130.0770.0730.0500.0420.0580.0460.2231.0000.0320.0450.0490.0470.0780.1850.1260.0250.2380.013
race0.0540.0110.0100.2440.0470.4420.4460.3810.0750.0720.0560.0690.0070.0660.0230.1040.0110.0220.1570.0870.0410.0540.0550.0790.0250.0480.0380.0410.0250.0470.0321.0000.0230.0430.0190.0590.1050.0550.0050.0400.045
reason_for_unemployment0.0760.0000.0000.0290.4380.0210.0230.0310.0400.0620.1480.1570.0000.0640.0780.0450.0000.0740.0190.0110.0280.1490.1440.0370.0410.0230.0250.0240.0270.0600.0450.0231.0000.0210.0460.0290.0750.0700.0090.1160.018
region_of_previous_residence0.0690.0020.0070.0750.0270.0540.0560.0560.0550.0430.0290.0280.0000.0120.0220.0210.0020.1320.0470.0330.7070.0250.0250.0380.0070.6300.4580.7080.8670.0210.0490.0430.0211.0000.0080.0260.0490.0000.0000.0260.290
sex0.0630.0560.0760.0040.1230.0240.0220.0270.0540.3730.3050.3920.0090.0640.0150.0340.0660.1020.0110.0330.0050.2970.3320.1660.0300.0070.0060.0070.0010.1090.0470.0190.0460.0081.0000.1590.0370.0720.0360.1160.000
target0.2460.3100.1750.0440.2340.0720.0700.0580.1960.2240.2830.4370.1450.3770.0660.1560.0290.1530.0670.0110.0250.2800.3660.1960.0720.0270.0160.0370.0240.2370.0780.0590.0290.0260.1591.0000.2200.1420.0670.2690.019
tax_filer_stat0.5880.0580.0950.0560.4870.0770.0730.0610.5360.6630.4890.4980.0360.5450.1770.5320.0240.2790.0770.0460.0510.4890.4930.7180.1650.0530.0270.0940.0480.5220.1850.1050.0750.0490.0370.2201.0000.5040.0780.5320.003
veterans_benefits0.6430.0370.0530.0880.3890.0790.0740.0880.5050.6060.3880.3880.0250.7070.1030.6290.7070.3060.0700.0270.0150.3880.3870.4480.1260.0160.0160.1090.0020.4060.1260.0550.0700.0000.0720.1420.5041.0000.0560.3970.000
wage_per_hour0.0370.0080.0110.0170.0830.0000.0000.0000.0520.0360.0690.079-0.0010.0510.0240.0430.0080.0550.0100.0230.0000.0680.0690.0390.3500.0000.0000.0000.0000.2240.0250.0050.0090.0000.0360.0670.0780.0561.0000.2160.000
weeks_worked_in_year0.2670.1260.1040.0330.4480.0300.0280.0220.2740.2230.3060.3100.1550.2870.1850.2840.0180.3260.0260.0280.0360.3060.3040.1990.2210.0270.0350.0360.0360.8780.2380.0400.1160.0260.1160.2690.5320.3970.2161.0000.007
year0.0090.0000.0000.0060.0060.0320.0290.0200.0000.0000.0060.0080.0040.0100.0080.0000.0000.7900.0400.0260.9860.0090.0090.0070.0080.9820.9861.0000.2900.0340.0130.0450.0180.2900.0000.0190.0030.0000.0000.0071.000

Missing values

2025-01-19T18:38:45.701472image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-01-19T18:38:46.676429image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

ageclass_of_workerdetailed_industry_recodedetailed_occupation_recodeeducationwage_per_hourenroll_in_edu_inst_last_wkmarital_statmajor_industry_codemajor_occupation_coderacehispanic_originsexmember_of_a_labor_unionreason_for_unemploymentfull_or_part_time_employment_statcapital_gainscapital_lossesdividends_from_stockstax_filer_statregion_of_previous_residencestate_of_previous_residencedetailed_household_and_family_statdetailed_household_summary_in_householdinstance_weightmigration_code_change_in_msamigration_code_change_in_regmigration_code_move_within_reglive_in_this_house_1_year_agomigration_prev_res_in_sunbeltnum_persons_worked_for_employerfamily_members_under_18country_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfcitizenshipown_business_or_self_employedfill_inc_questionnaire_for_veteran's_adminveterans_benefitsweeks_worked_in_yearyeartarget
038Private sectorTransportationConstruction tradesBelow High School0Not in universeMarriedManufacturing-durable goodsMachine operators assmblrs & inspctrsWhiteMexican (Mexicano)FemaleNot in universeNot in universeFTE000Joint FilerNot in universeNot in universePrimary HouseholderSpouse of householder1032.38Not in universeNot in universe?Not in universeNot in universe4Not in universeMexicoMexicoMexicoForeignNot in universeNot in universeNot a Veteran1219951
144Self-employedWholesale and retail tradeOther professional specialty occupationsSome College0Not in universeMarriedBusiness and repair servicesProfessional specialtyWhiteAll otherFemaleNot in universeNot in universePTE002500Joint FilerNot in universeNot in universePrimary HouseholderSpouse of householder1462.33Not in universeNot in universe?Not in universeNot in universe1Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran2619951
22Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteMexican-AmericanMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1601.75Not in universeNot in universe?Not in universeNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019951
335Private sectorBusiness and repair servicesManagement related occupationsHigh School Graduate0Not in universeDivorcedTransportationExecutive admin and managerialWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Individual FilerNot in universeNot in universePrimary HouseholderHouseholder1866.88No movementSame areaNonmoverYesNot in universe5Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNoNot in universeNot a Veteran5219941
449Private sectorManufacturing-durable goodsAutomobile mechanics and repairersHigh School Graduate0Not in universeDivorcedConstructionPrecision production craft & repairWhiteAll otherMaleNot in universeNot in universeFTE000Individual FilerNot in universeNot in universeOtherNonrelative of householder1394.54Not in universeNot in universe?Not in universeNot in universe4Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5019951
513Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married2556.34No movementSame areaNonmoverYesNot in universe0Both parents presentGermanyUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019941
61Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteMexican-AmericanFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1723.61No movementSame areaNonmoverYesNot in universe0Both parents presentMexicoUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019941
761Not in universeNot in universe or childrenNot in universeHigh School Graduate0Not in universeMarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot Employed000Joint FilerNot in universeNot in universePrimary HouseholderSpouse of householder1083.03Not in universeNot in universe?Not in universeNot in universe0Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran019951
838Private sectorTradeOther professional specialty occupationsAdvanced Degree0Not in universeMarriedOther professional servicesProfessional specialtyBlackAll otherMaleNot in universeNot in universeChildren or Armed Forces000Joint FilerNot in universeNot in universePrimary HouseholderHouseholder1767.95No movementSame areaNonmoverYesNot in universe1Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219941
97Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1595.19No movementSame areaNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019941
ageclass_of_workerdetailed_industry_recodedetailed_occupation_recodeeducationwage_per_hourenroll_in_edu_inst_last_wkmarital_statmajor_industry_codemajor_occupation_coderacehispanic_originsexmember_of_a_labor_unionreason_for_unemploymentfull_or_part_time_employment_statcapital_gainscapital_lossesdividends_from_stockstax_filer_statregion_of_previous_residencestate_of_previous_residencedetailed_household_and_family_statdetailed_household_summary_in_householdinstance_weightmigration_code_change_in_msamigration_code_change_in_regmigration_code_move_within_reglive_in_this_house_1_year_agomigration_prev_res_in_sunbeltnum_persons_worked_for_employerfamily_members_under_18country_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfcitizenshipown_business_or_self_employedfill_inc_questionnaire_for_veteran's_adminveterans_benefitsweeks_worked_in_yearyeartarget
9975122Private sectorManufacturingFood service occupationsSome College0College or universityNever MarriedEducationAdm support including clericalWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Individual FilerNot in universeNot in universePrimary HouseholderHouseholder1164.53No movementSame areaNonmoverYesNot in universe6Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219941
9975246Private sectorTransport, communications, utilitiesManagement related occupationsSome College0Not in universeMarriedSocial servicesExecutive admin and managerialBlackAll otherFemaleNot in universeNot in universeFTE000Joint FilerNot in universeNot in universePrimary HouseholderSpouse of householder1197.34Not in universeNot in universe?Not in universeNot in universe1Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219951
997532Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1858.67Not in universeNot in universe?Not in universeNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019951
9975417Private sectorPublic administrationPersonal service occupationsBelow High School0High schoolNever MarriedRetail tradeOther serviceWhiteAll otherFemaleNot in universeNot in universeFTE000Non-FilerNot in universeNot in universeChildChild under 18 never married1414.11Not in universeNot in universe?Not in universeNot in universe6Mother only presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran119951
9975520Not in universeNot in universe or childrenNot in universeSome College0College or universityNever MarriedNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeNot Employed000Non-FilerNot in universeNot in universeOtherNonrelative of householder1544.21Not in universeNot in universe?Not in universeNot in universe0Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran019951
997564Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteMexican-AmericanMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1335.91Not in universeNot in universe?Not in universeNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe019951
9975861Private sectorUtilities and sanitary servicesConstruction tradesBelow High School0Not in universeSeparatedManufacturing-durable goodsMachine operators assmblrs & inspctrsBlackAll otherMaleNoNot in universeFTE000Individual FilerNot in universeNot in universePrimary HouseholderHouseholder2511.11Not in universeNot in universe?Not in universeNot in universe4Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219951
9975924Self-employedAgricultureOther transportation and material movingBelow High School0Not in universeMarriedAgricultureFarming forestry and fishingWhiteMexican (Mexicano)MaleNot in universeNot in universeChildren or Armed Forces000Joint FilerNot in universeNot in universeOtherNonrelative of householder2083.76No movementSame areaNonmoverYesNot in universe2Not in universeMexicoMexicoMexicoNaturalizedNot in universeNot in universeNot a Veteran5219941
9976030Private sectorTradeOther executive, admin and managerialCollege Graduate0Not in universeMarriedOther professional servicesExecutive admin and managerialWhiteAll otherFemaleNot in universeNot in universeFTE000Joint FilerNot in universeNot in universePrimary HouseholderSpouse of householder1680.06Not in universeNot in universe?Not in universeNot in universe5Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran5219951
9976167Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeMarriedNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Joint FilerNot in universeNot in universePrimary HouseholderHouseholder1582.48No movementSame areaNonmoverYesNot in universe0Not in universeUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran019941

Duplicate rows

Most frequently occurring

ageclass_of_workerdetailed_industry_recodedetailed_occupation_recodeeducationwage_per_hourenroll_in_edu_inst_last_wkmarital_statmajor_industry_codemajor_occupation_coderacehispanic_originsexmember_of_a_labor_unionreason_for_unemploymentfull_or_part_time_employment_statcapital_gainscapital_lossesdividends_from_stockstax_filer_statregion_of_previous_residencestate_of_previous_residencedetailed_household_and_family_statdetailed_household_summary_in_householdinstance_weightmigration_code_change_in_msamigration_code_change_in_regmigration_code_move_within_reglive_in_this_house_1_year_agomigration_prev_res_in_sunbeltnum_persons_worked_for_employerfamily_members_under_18country_of_birth_fathercountry_of_birth_mothercountry_of_birth_selfcitizenshipown_business_or_self_employedfill_inc_questionnaire_for_veteran's_adminveterans_benefitsweeks_worked_in_yearyeartarget# duplicates
01Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeBlackAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder3556.10No movementSame areaNonmoverYesNot in universe0Mother only presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe0199412
15Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeBlackAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder538.04Not in universeNot in universe?Not in universeNot in universe0Mother only presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe0199512
25Not in universeNot in universe or childrenNot in universeChildren0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder1958.46No movementSame areaNonmoverYesNot in universe0Neither parent presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot in universe0199412
315Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeBlackAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeExtended FamilyOther relative of householder1334.77No movementSame areaNonmoverYesNot in universe0Mother only presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199412
415Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1978.23No movementSame areaNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199412
515Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married2100.03No movementSame areaNonmoverYesNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199412
615Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot Employed000Non-FilerNot in universeNot in universeChildChild under 18 never married993.45Not in universeNot in universe?Not in universeNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199512
715Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot Employed000Non-FilerNot in universeNot in universeChildChild under 18 never married1332.77Not in universeNot in universe?Not in universeNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199512
815Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherFemaleNot in universeNot in universeNot Employed000Non-FilerNot in universeNot in universeChildChild under 18 never married2575.48Not in universeNot in universe?Not in universeNot in universe0Both parents presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199512
915Not in universeNot in universe or childrenNot in universeBelow High School0Not in universeNever MarriedNot in universe or childrenNot in universeWhiteAll otherMaleNot in universeNot in universeChildren or Armed Forces000Non-FilerNot in universeNot in universeChildChild under 18 never married1022.09No movementSame areaNonmoverYesNot in universe0Mother only presentUnited-StatesUnited-StatesUnited-StatesNativeNot in universeNot in universeNot a Veteran0199412